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Abstract 

Computational modules early in the human vision system typically generate sparse information 
about the shapes of visible surfaces in the scene. Moreover, visual processes such as stereopsis can 
provide such information at a number of levels spanning a range of resolutions. In this paper, we 
extend this multi-level structure to encompass the subsequent task of reconstructing full surface 
descriptions from the sparse information. The mathematical development proceeds in three steps. 
First, the surface most consistent with the sparse constraints is characterized as the solution to an 
optimal approximation problem which is posed as a variational principle describing the constrained 
equilibrium state of a thin flexible plate. Second, local, finite element representations of surfaces 
are introduced and, by applying the finite element method, the continuous variational principle 
is transformed into a discrete problem in the form of a large system of linear algebraic equations 
whose solution is computable by local-support, cooperative mechanisms. Third, to exploit the 
information available at each level of resolution, a hierarchy of discrete problems is formulated 
and a highly efficient multi-level algorithm, involving both intra-level relaxation processes and 
bi-directional inter-level local interpolation processes is applied to their simultaneous solution. 
Examples of the generation of hierarchies of surface representations from stereo constraints are 
given. Finally, the basic surface approximation problem is revisited in a broader mathematical 
context whose implications are of relevance to vision. 
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1. INTRODUCTION 


A fundamental problem in early vision is that of inferring the three-dimensional geometry 
of visible surfaces in a scene from the intensity information available in the retinal images. 
It seems that advanced biological vision systems subdivide this surprisingly difficult problem, 
distributing its solution over a number of subsystems or modules which contribute to the recovery 
of information about surface shape. Each module specializes in the interpretation of specific classes 
of image cues and, to a first approximation, it performs its task independently of the other 
subsystems. Examples of primary modules which have been identified include those responsible 
for stereo vision and the perception of motion. 

The computational framework set forth by Marr [Marr, 1976, 1982; Marr and Poggio, 1977] 
has had a strong influence in the understanding of early vision. In particular, it has provided a 
paradigm w'hich dictates that the modules initially be characterized in terms of the visual tasks 
which they must perform and, subsequently, that they be studied in terms of the computational 
processes through which they perform these tasks. The various computational processes in early 
vision transform symbolic representations of images into symbolic representations of surfaces 
over several stages of analysis. An understanding of some of the details of these processes and 
representations has evolved recently, especially at the stages closer to the image, where much 
of the work has been inspired by recent advances in neuroscience ([Marr, 1976], [Marr and 
Hildreth, 1980], [Marr and Poggio, 1979], [Ullman 1979a], etc.). On the other hand, our insights 
at stages closer to explicit surface and volumetric representations are meager. For example, 
these fundamental questions remain open: first, how is the information generated by the various 
modules, amalgamated into representations of surfaces (see, e.g., [Nishihara, 1981]) and, second, 
how do such representations give rise in turn to representations of the three-dimensional properties 
of objects in the scene (see, e.g., [Brooks, 1981], [Marr and Nishihara, 1978], and [Nevatia and 
Binford, 1977]). 

The work described in this paper is part of ongoing research into the problem of obtaining 
surface representations which will be of use to later processing stages in vision. In the context of 
Marr s framework, our goal is to analyze the process through which the sparse information retrieved 
by, say, stereopsis or analysis of motion is combined and transformed into full, retinocentric 
descriptions of surface shape consistent with our perception when we look around us. In particular, 
we will argue that multiple, full surface representations spanning a range of resolutions are desirable 
and, indeed, show that they may be generated as an integral part of a highly-efficient, multi-level 
surface reconstruction algorithm. Moreover, our approach seems sufficiently general to allow several 
classes of surface shape information (such as local depth or orientation) provided by a number of 
vision modules to be merged in a meaningful way. 
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1.1. Motivation of the Multi-Level Approach 

To clarify our intentions, we will first examine Marr’s framework for early vision in some 
detail. The framework is characterized by at least three major processing stages, each of which 
transforms one retinocentric represent;.tion into another with the purpose of inferring, and making 
explicit, relevant information about the surfaces in a scene. The first stage transforms the intensity 
representations (or retinal images) into a primary representation, called the primal skeich [Marr, 
1976]. Changes in the physical properties of surfaces almost always give rise to intensity changes 
in the images, and it is at the level of the primal sketch that the locations of these changes are 
made explicit. In the second processing stage, specialized processes, such as those concerned with 
stereo and shape from motion, infer information about the shape of surfaces from the contents of 
the primal sketch. Since inferences can typically be made only at those locations which have been 
marked in the primal sketch, the information generated is sparse, and it is collected into sparse 
representations of surface shape that are referred to as the raw 2 \-D sketch. The final stage is 
one of full surface reconstruction in which the sparse representations are transformed into a full 
2 2 ~ D sketch containing explicit information about surface shape at all points in the scene. 

The goal of the first processing stage is the detection of intensity changes in the image. 
Recently, Marr and Hildreth [Marr and Hildreth, 1980; Hildreth, 1980] proposed a theory of edge 
detection which was inspired by existing neurophysiological evidence and certain mathematical 
issues. An important aspect of this theory is that intensity changes in the images must be 
isolated at different scales of resolution. Indeed, there is evidence that the human visual system 
detects intensity changes over a range of resolutions through me use of up to five independent, 
spatial-frequency-tuned, bandpass channels [Campbell and Robson, 1968; Wilson and Giese, 1977; 
Wilson and Bergen, 1979; Marr, Poggio, and Hildreth, 1980]. The existence of these independent 
primal sketch representations is a crucial factor which contributes to the success of some later 
computations such as stereopsis, as modeled by the Marr-Poggio theory of stereo vision [Marr and 
P°ggi°> 1979] (see also [Mayhew and Frisby, 1980, 1981]). According to this model, the bandpass 
nature of the channels leads to an almost trivial solution to the stereo correspondence problem 
within the disparity range of each channel. Detailed depth information over a wide disparity range 
is obtained through a process by which the coarser channels control vergence eye movements 
that bring the finer channels into alignment (general studies of vergence eye movements include 
[Riggs and Niehl, 1960] and [Rashbass and Westheimer, 1961]). On the other hand, computations 
such as motion correspondence [Ullman, 1979a], whose function may not depend critically on 
the existence of multiple representations, may nevertheless be operative at each of the levels. It 
seems likely in any case that multiple sparse representations of surface shape that span a range 
of resolutions are generated by most of these modules. 

In the context of stereopsis, Grimson [1981a, 1982a] pioneered the mathematical theory 
of the subsequent visual surface reconstruction process which transforms the sparse surface 
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Figure 1. The stereo module with single-level surface reconstruction. 


LEFT EYE 


RIGHT EYE 



descriptions into full ones. He proposed that, before reconstruction begins, the multiple, sparse 
depth representations output through the different bandpass channels be combined into a single 
raw 2^-D sketch in a way which maintains consistency across all scales. The raw 2|-D sketch then 
contains sparse depth information at the finest resolution possible. Next, a single reconstruction 
process operating at this finest level generates a unique full 2£-D sketch representing depth 
information at high resolution. The steps are illustrated in Figure 1, in which only three bandpass 
channels are shown for simplicity. 
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A single full surface representation at the finest resolution possible certainly captures all 
of the information provided by the stereo module and it should, in principle, be sufficient input 
to later tasks. Unfortunately, a number of technical problems arise with this simple approach. 
First, in collapsing the multiple depth representations into one raw 2^-D sketch, information 
potentially useful in later processing stages which are concerned with object-centered surface 
descriptions, 3-D models of objects, and object recognition has been discarded prematurely. It 
now seems likely that in order for some of these later stages to succeed and work efficiently, 
surface representations at multiple scales will be necessary, just as they are necessary at earlier 
stages such as stereopsis. In accordance with Marr’s principle of least commitment [Marr, 1976], it 
would be wasteful to discard information, prior to surface reconstruction, which may have to be 
regenerated later. A second, and more immediately serious problem is a consequence of the great 
bulk of incoming information within the large raw 2^-D sketch which must be processed at the 
finest resolution. Biologically feasible surface reconstruction algorithms such as those developed 
by Grimson are extremely inefficient at generating full surface descriptions when faced with such 
large representations. Roughly speaking, the primary reason for this inefficiency is due to the 
local nature of the algorithms in question. 

The above problems may be avoided if the sparse representations are not collapsed into 
a single fine representation. Instead, multiple full surface representations spanning a range of 
resolutions ought to be generated by the reconstruction process itself and made available to 
processing stages beyond. The multi-level surface reconstruction algorithm which we will develop 
in this paper accomplishes precisely this. Because the algorithm exploits information available at 
coarser resolutions, its speed efficiency is dramatically superior to that of single level reconstruction 
schemes. Order-of-magnitude improvements are typically observed for surfaces reconstructed from 
information provided by stereopsis. On the other hand, the expense in space in maintaining all 
the coarser representations is very worthwhile since it turns out to be only a fraction of that 
required to maintain the finest one. 

bigure 2 illustrates the multi-level surface reconstruction scheme and its incorporation into 
stereopsis. A fundamental point to realize about the multi-level approach in general is that 
information about surface depth, or for that matter surface orientation, is provided in each of 
the channels (i.e., sparse representations) by the various vision modules and, as will be shown, 
contributes in an optimal way to the generation of the hierarchy of full surface representations. 
The multi-level scheme involves both intra-level processes which propagate information across a 
single representation, as well as inter-level processes which communicate between representations. 
The inter-level processes are further classified into those which transfer information from coarser 
levels to finer ones, and those which transfer information from finer levels to coarser ones. At 
this point, we emphasize that multiple representations of consistent accuracy can be achieved 
only if such a bi-directional flow of information is allowed to take place between the levels. This 
statement will be substantiated rigorously in a later section. 
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Figure 2. Multi-level approach to surface reconstruction. 
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If the processes and representations envisioned are to be considered as models of the 
human visual system, their form is constrained from below by what can be implemented in 
neuronal hardware. Although our incomplete knowledge renders premature the formulation of 
precise arguments along these lines, some constraints which presently seem compelling, such 
as parallelism, locality and simplicity of computation, efficiency, uniformity, and extensibility 
[Ullman, 1979b] have been factors in the theoretical analysis of the surface reconstruction problem 
and in the search for algorithms (similar constraints are issues when considering implementations 
within parallel, pipelined computer architectures). Once a specific algorithm is selected based 
on these constraints, it may be implemented and its performance can be evaluated empirically 
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in terras of the original computational goal. If the performance is consistent with psychological 
evidence, the algorithm may be thought of as constituting an existence proof that the theory 
adequately models the human vision system. This is what we are ultimately striving for in our 
study of the surface reconstruction problem. 

1.2. Overview 

In this paper, we lay down the mathematical foundations of a multi-level approach to visual 
surface reconstruction, primarily in .he context of stereo vision. With the help of a physical 
model, the basic surface approximation problem is given an intuitive interpret?tion. Although it is 
in general a nontrivial matter to solve this problem, our model suggests the application of potent 
methods which have arisen out of classical mathematical physics — the calculus of variations, 
optimal approximation theory, and functional analysis. Aspects of the above formalisms are 
employed to render our problem amenable to solution by numerical techniques. The development 
is as follows. 

• In Chapter 2, visual surface reconstruction is cast as an optimal approximation problem 
which involves finding the equilibrium position of a thin flexible plate undergoing bending. 
The problem is posed formally as a variational principle which we propose to solve by first 
converting it to discrete form using the finite element method. 

• In Chapter 3, we prepare the way for applying this discrete approximation method by 
finding a set of minimal conditions for our continuous problem to have a unique solution. 
We show that these requirements will almost always be satisfied in practice, so that we can 
consider our surface approximation problem to be well-posed, and can proceed to obtain the 
solution. 


In Chapter 4, we turn to the task of converting our continuous problem into discrete 
form. To do so, we define a simple nonconforming finite element which will constitute the 
basis of our local, piecewise continuous representation of surfaces. Because the element is 
nonconforming, we first must prove that it leads to unique discrete approximations, and 
that these approximations converge to the exact solution as the elements decrease in size. 
Having done this, we derive the discrete surface approximation problem as a large system 
of linear equations. 

In Chapter 5, we face the task of solving this linear system efficiently in a biologically-feasible 
way, and it is here that we motivate the multi-level approach for obtaining the solution. 
The approach involves setting up a hierarchy of discrete surface approximation problems 
which span a range of resolutions and exploit the information available at each scale, and 
subsequently invoking a multi-level algorithm to solve them simultaneously. We demonstrate 
the efficient performance of the multi-level surface reconstruction algorithm on constraints 
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from stereopsis, and demonstrate that it generates a useful hierarchy of accurate surface 
representations. 

• In Chapter 6, we reexamine our surface reconstruction problem and show that it is a special 
case within a general class of optimal interpolation problems involving arbitrary degrees of 
continuity, in any number of dimensions. These general problems involve the minimization 
of functionals which possess a number of invariance properties making them attractive for 
application to problems in early vision whose solutions require the iterative propagation of 
smoothness constraints across retinocentric representations. 

• In Chapter 7, we conclude by discussing the overall implications of our approach to issues 
concerning the isolation of depth discontinuities, and the incorporation of other sources of 
information such as surface orientation. We discuss possible solutions to these problems in 

view of our finite element representation of surfaces and the multi-level surface reconstruction 
algorithm. 

• For convenience, in two of the appendices, we cover the relevant mathematical background 
of the finite element method and the iterative solution of large linear systems. 

2. THE MOST CONSISTENT SURFACE 

The sparse information about surface shape retrieved by the various vision modules is in 
general underconstraining, J hat is to say, it is insufficient to compel a unique inference of the 
physical properties of the surfaces in the scene. Yet, even when presented with impoverished 
stimuli (such as random dot stereograms [Julesz, 1971] or kinetic depth effect displays [Wallach 
and O’Connell, 1953; Wallach, 1959; Johansson, 1975; Ullman, 1979a; etc.]), the human visual 
system routinely arrives at unique interpretations and our typical perception is a stable one of 
full surfaces in depth. Clearly, the visual system must invoke certain assumptions which provide 
enough additional constraint to allow a unique full surface representation to be computed from 
the sparse information provided. However, these additional assumptions must be plausible in that 
they reflect certain domain-dependent expectations. For example, in stereo vision, the sparse 
information takes the form of depth constraints which embody measurements of the distances 
from the viewer to the surfaces of objects in the scene. The additional assumptions should then 
be based on general expectations about physical properties of surfaces in the visual world, as well 

as aspects of the optical and computational processes taking part in the generation of the depth 
constraints. 

Crimson [1981a] explored a number of issues along these fines. Qualitatively, his thesis is 
as follows. A surface in the scene which varies radically in shape usually gives rise to intensity 
changes which are marked in the primal sketches as zero-crossings of the Laplacian of the 
Gaussian-convolved images ( W 2 G*I — see [Marr and Hildreth, 1980; Hildreth, 1980]). Moreover, 
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it is only at the locations of zero-crossings that the Marr-Poggio stereo algorithm can generate 
measurements of the distance to the surface, in the form of explicit depth constraints. Therefore, 
the surface cannot in general be varying radically in depth between the constraints to which it 
gave rise. By introducing this additional surface smoothness assumption, the goal of accurately 
reconstructing the shape of visible surfaces and thereby computing full surface representations 
consistent with our perception is attainable in principle. A theoretical proof of this statement lies 
in the domain of mathematics. In the next section, we take the first step by rigorously formulating 
the surface reconstruction problem as an optimal approximation problem in which the smoothness 
assumption has a clear intuitive interj retation, and which eventually leads to efficient multi-level 
algorithms for its solution. 

2.1. A Physical Interpretation — The Bending of a Thin Plate 

» 

Visual surface reconstruction can be characterized formally as a constrained, optimal 
approximation problem in two dimensions. In the context of stereo vision, where constraints 
embody depth measurements to surfaces in the scene, the goal is to reconstruct, as accurately as 
possible, the shape of the surface which gave rise to these measurements. Of course, it is necessary 
that we be able to deal with the complication arising from arbitrarily-placed constraints since, as 
has been argued, constraints of this type are generated naturally by stereopsis and other vision 
modules. More rigorously, the problem can be stated as follows: given a finite set of arbitrarily 
located distinct point constraints on a plane, each constraint having a real scalar value associated 
with it, find the unique optimal function of two variables which is most consistent with these 
constraints. Our notion of consistency will be defined shortly. We consider the solution to our 
problem to be a full surface representation in that it makes explicit our best estimate of the 
distance to every visible point on the surface in the scene. 

The constraints provided by the stereo computation are never completely reliable. Errors due 
to noise, and errors in matching corresponding zero-crossings are bound to occur. This suggests 
that we should not try to interpolate the given data exactly because a few “bad” constraints 
can have a detrimental effect on the shape of the recovered surface. Relaxing the interpolation 
requirement turns our problem into one of surface approximation in which we would like to 
maintain control over how closely the surface fits the data. 

By thinking in terms of an optimal surface, we imply the choice of a suitable criterion that 
will allow us to measure the optimality of admissible functions. A suitable criterion for measuring 
the optimality of surfaces in the context of surface approximation in stereopsis translates into 
a precise mathematical statement which captures intuitive notions about the smoothness of 
admissible surfaces as well as their closeness of fit to the known depth constraints. Perhaps the 
intuitively clearest treatment of our problem is in terms of a physical model. Consider a planar 
region Cl, the region within which we wish to obtain an optimal approximating surface most 
consistent with a finite set of sparse constraints. Let us imagine that the constraints constitute a 
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Figure 3. Tdie physical model for surface approximation. 



set of vertical pins scattered inside Q, the height of an individual pin being related to the distance 
from the viewer to the surface in the scene. Suppose that we take a thin flexible plate of elastic 
material that is planar in the absence of external forces, and constrain it to pass near the tips of 
the pins by attaching ideal springs between the pin tips and the surface of the plate as shown in 

Figure 3. It is not difficult to imagine the equilibrium position of the plate as a function of the 
various pin heights. 

Intuitively, the equilibrium position of the thin plate is a “fair” approximating surface in 
that it will exhibit a sufficient amount of smoothness between the constraints. Moreover, on 
quantitative grounds, there is evidence to suggest that such a surface is indeed an optimal one in 
terms of the imaging process [Grimson, 1982b]. In any case, we have a reasonable physical model 

for the optimal approximating surfaces and, moreover, this model will suggest good strategies for 
solving our problem. 

We emphasize however that the appropriateness of the model depends on two important 
issues. The first involves ensuring that a unique solution exists, and the second is to guarantee 
that the solution is meaningful in view of the constraints. Firstly, we realize that the plate-spring 
system will be unstable for certain pin configurations. If we have but a single pin, then a stable 
equilibrium does not exist, as the plate has two unconstrained degrees of freedom (rotation about 
the axis of the pin is excluded). A similar degenerate situation arises for the case of any number 
of pins arranged linearly, the plate then having one unconstrained degree of freedom. Clearly 
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at least three noncollinearly arranged pins are required to assure that a unique state of stable 
equilibrium exists. Secondly, a reasonable choice must be made for the stiffness of the springs. 
If the springs are too strong in relation to the rigidity of the plate material, then a pin whose 
height deviates significantly from that of its neighbors (i.e. an erroneous constraint) will place an 
abnormally large amount of strain on the plate locally and have undesirable effects on the shape 
of the surface. On the other hand, if the springs are too weak, the intrinsic rigidity of the plate 
can overwhelm them and the plate will remain nearly planar over large variations in the height 
of the pins. In the limit of a rigid plate, the resulting planar, least-squares approximation would 
be meaningless in that the solution does not in general lie close to constraints other than those 
arising from nearly flat surfaces. 

We will now proceed to a mathematical characterization of the above physical model. To do 
so, we apply the well-known minimum potential energy principle from classical mechanics, which 
states that the potential energy of a physical system in a state of stable equilibrium is at a local 
minimum. For the model, the potential energy in question is that due to deformation of the plate 
and springs, as well as the energy imparted by any externally applied forces. In this sense then, 
the surface we seek is one of minimal energy. 

First, consider the plate. It is known (see, e.g., [Courant and Hilbert 1953, pg. 250], [Landau 
and Lifshitz, 1970]) that the potential energy of a thin plate under deformation is given by an 
integral of a quadratic form in the principle curvatures of the plate. If the principle curvatures 
of the deformed plate are denoted by and k,- 2 , the potential energy density is given by an 
expression of the form 

f (*? + 4 ) + B« lK2 = 2A(ii±^) 2 — [A — B) KlK2 , 

where A and B are constants determined by the plate material. The expression ^ + * 2 ) is the 
first or mean curvature and /C[/t 2 is the second or Gaussian curvature of the plate’s surface (see, 
e.g., [Hilbert and Cohn-Vossen, 1952]). 

Let the function v(x, y ) denote the deflection of the plate normal to the region fi which can 
be taken to he in the X-Y plane. Assuming that the deflection function and its partial derivatives, 
v,v x ,v yt ... are small, it can be shown (see e.g. [Rektorys, 1969, pg. 368]) that 

K i + *2 1 A 9 

--« ~ Av, Ki k 2 ^v xx Vyy — v‘ y , 


where A g x 2 T~ $y2 denotes the Laplacian operator. Thus, the potential energy density can be 
written in the following forms: 
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e l = \{& V f - (1 - <r)(Vx X Vyy — V \ y ) 

W yy) a ( V xxVyy V xy) (1) 

= \ a ( Av f + C 1 ~ <r)(v 2 xx + 2v 2 xy + v 2 yy ) , 

apart from a multiplicative constant which depends on the physical properties of the elastic 
material of the plate, and which has been set to unity without loss of generality. The constant cr, 
called the Poisson ratio , measures the change in width as the material is stretched lengthwise. 1 
The desired potential energy of deformation is obtained by integrating the energy density over 
the domain in question, and is given by 

= / f n - (1 - <r)(v xx Vyy — v 2 xy ) dxdy. 

To the potential energy of deformation, we must add the potential energy due to any 
external forces which may be present. The energy due to a force density g(x, y) applied to the 
surface of the plate (such as the effect of gravity) is given by 




JL 


gvdx dy. 


External forces and bending moments may also be applied around the boundary <90 of 0. The 
energy due to a force density p(s) on the boundary (s denotes arc length along the boundary) is 

£s[v) — — / p(s)vds, 

Jan 

while the energy due to bending moments applied around the boundary is 


<f 4 (u) = — f m(s) 

Jan 


dv 

dn 


ds, 


where m(s) is the density of applied bending moments normal to the curve and denotes the 
directional derivative along the outward normal to <90. 


Finally, we must account for the potential energy of deformation of the springs. Let C 
denote the set of points in 0 at which the imaginary pins are located; that is, the sparse set of 
locations at which the surface is constrained. Furthermore, denote the height of the pin (the value 
of the constraint) by real scalars C( XuVi ) and the stiffness of the spring attached to it (influence 
of the constraint) by positive constants 0 (x<>|w) for all (x t ,y t ) £ C . According to Hooke’s law for 
an ideal spring, the total potential energy of deformation in the springs is given by 


From the last expression in (1), it is apparent that the potential energy density may be considered 
to be a convex combination with parameter <7 of the square of the Laplacian a;d a quadratic term in 
second-order partial derivatives of the deflection function, (v 2 xx + 2v 2 xy + This fact will be used in 
a subsequent discussion. 
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^( W ) = \ Y Vi) - c^.y,)] 2 - 

( Xi,yi)ec 

Ihe equilibrium state of the mechanical system can be obtained as the solution to the 
following minimization problem which is referred to as a variational principle. 

The deflection of the plate at equilibrium is that function u from a set V of admissible 

functions v, for which the total potential energy 

£{v) — <fi(v) -f- e 2 (v) -f- <ra(v) -|- S 4 {v) -f- £${v) (2) 

is minimal. 

Thus, quantitatively, the ’’most consistent” surface which we seek is the one having minimal 
energy E. 

The visual surface approximation problem has been posed, in integral form, as a variational 
principle which is quadratic in that it involves terms that are at most quadratic in v and its 
derivatives. Through the formalism of the calculus of variations one can express the necessary 
condition for the minimum as the Euler-Lagrange equation which in this case can be shown to 
be a linear, self-adjoint, partial differential equation. Much of classical mathematical physics is 
based on this duality, and it provides numerous techniques for solving our problem, those which 
are directed towards the variational principle, as well as those which are directed towards its 
Euler-Lagrange equation. 

Whatever the strategy, although it is conceivable that the exact analytical solution u 
could be derived, it is normally impossible to do so for all but the simplest-shaped domains fi. 
Consequently, we are led to consider a numerical approach in which we somehow convert our 
continuous problem into a discrete problem whose numerical solution closely approximates the 
exact continuous solution u. We propose to employ what is probably the most potent tool for 
obtaining discrete approximations currently available — the finite element method. The method is 
applied to the variational principle directly and, because the variational principle is quadratic, the 
resulting discrete problem will take the particularly simple form of a linear system of algebraic 
equations. The main advantage of the finite element method is its generality. In the context 
of our surface approximation problem, it can be applied over domains ft of complicated shape, 
and it is not limited to uniform discretizations of these domains. The importance of the latter 
property in the context of vision is evident when one considers, for example, the nonuniform 
structure of the retina where it is known that resolution decreases approximately linearly with 
eccentricity (see [Wilson and Giese, 1977] and [Wilson and Bergen, 1979] for a quantitative model 
of this phenomenon in terms of the spatial-frequency channels in early vision). Moreover, the finite 
element method leads to linear systems which are readily solvable in a parallel, iterative fashion 

by a sparsely-interconnected network ol simple processors, a mechanism which seems prevalent in 
early vision. 
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For several reasons, we have avoided the alternate route of using the well-known finite 
difference method to discretize the associated Euler-Lagrange equations (see, e.g., [Collatz, 1966], 
[Forsythe and Wasow, 1960], [Smith, 1977]). The finite difference method is much more restrictive 
in that it practically limits us to uniform discretizations, the underlying convergence theory is 
much less well developed and, perhaps most importantly, it becomes very difficult to discretize 
the natural boundary conditions associated with our surface approximation problem, a task which 
is done trivially by the finite element method. 

The mathematical background of the finite element method that is of relevance to our 
problem is included in Appendix A for convenience. The appendix introduces the required theory 
and lists the fundamental theorems which we will invoke in applying the method to the task 
at hand. The process will consist of several steps. First we pose the variational principle in an 
abstract form that is the basis of the mathematical machinery presented in Appendix A. Next, 
we determine formally the requirements on the boundary conditions that must be satisfied to 
ensure that our variational principle is well-posed; i.e., has a unique solution. Only then can we 
proceed to apply the finite element method to approximate the solution. 


3. THE VARIATIONAL PRINCIPLE 


In this chapter we analyze the continuous variational principle which embodies our visual 
surface approximation problem, in preparation for the application of the finite element method. 
In view of the formalism presented in Appendix A, our first goal is to state the variational 
principle in the abstract form; that is, to isolate the energy inner product which characterizes our 
minimization problem. We then derive the associated Euler-Lagrange equation and, in the process, 
consider the various forms of boundary conditions that can be imposed. Finally, we choose the 
appropriate form of these conditions in view of our visual surface approximation problem and 
obtain formally the minimum requirements for our variational principle to be well-posed. 

3.1. The Energy Inner Product 


According to equation (2.2), our variational principle asserting that the equilibrium state of 

the thin plate is a minimal-energy configuration, may be stated mathematically as the minimization 
of the expression 


~ J j n 2 ( Au ) 2 — f 1 — cr K v t — v %)— 9 V dz dy 

~L p{s)vds -~L m{s) ^ ds+ 2 E! 


(■£i,yi)€C 


Here, we have assumed that the spring stiffnesses 0 {xi , yt) = 0 for all {x l>yi ) £ C. The admissible 
space V for our variational principle is in general a subspace of the second-order Sobolev space 
M 2 (fl) over the region J! (refer to the discussion in Appendix A). If u £ V minimizes £, then 
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£( u ) 5; £{u-\-ev) for any v £ V. Therefore, to obtain the necessary condition for the minimum, we 
substitute for v the small variation u -f- tv about u and equate to zero the derivative with respect 
to e for e = 0. Equivalently, we may perform the variation using the rules of differentiation: 

6 ' e ' (u) = / /, AuS ( Au ) -- (! - °)\ u - 2u,„,5(u,, / )] - gSu dx dy 

y (xi,j u)ec 

Since variation comimites with differentiation, 6(Au) — A(5u), <5(^) = 6(u yv ) == (5u) yy 

etc. If we now let v = 6 u, and set 6£ = 0, we obtain 

J In ~~ “ a )[ u xxv yy -f Uy y v xx — 2u xy v xy ) — gv dx dy 

~ Jan P[s)vdS ~ J m mU) £ dS + P E = 0. 


(iny»)6C 


Equations (1) and (2) may be cast in our abstract variational formulation of Appendix A. 
The key is in identifying the energy inner product as the bilinear form 


a (u, v) J J^AuAv (1 a)(u xx Vyy -f- u yy v xx — 2u xy v xy ) dx dy -f- /3 u(x t ,y l )v(x l ,y i ), 

(3) 


(*,-,y«)ec 


and in writing the linear form as 


f( v ) 


/ / gvdxdy + / p(sWs-f 
J Jn Jan 


I. 


/ \ d v 

mis)— ds-\-Q 
on on 


E 


hi,Vi)6C L 


C [xi,yi) v { x i> Vi) X c fi,, y< ) 


(4) 


Clearly then, (1) asserts that we are to minimize the quadratic functional £(v) — 1 a(v , v )_ f(v), 

as required in the definition of the abstract variational principle (Definition A.l). On the other 
hand, (2) which expresses the necessary condition for the vanishing of the first variation may be 

written as a(u, v ) = f(v), as expected from the discussion leading up to the variational equation 
(A. 10). 


Having obtained expressions for the bilinear and linear forms, we can proceed to bring the 
finite element method to bear on the problem. Before doing so, however, it is imperative that 
we carry the analysis further so that we can express the necessary condition for a minimum 
as a partial differential equation, explore the issue of boundary conditions, and ensure that the 
problem is well-posed. 


3.2. The Euler-Lagrange Equation and Boundary Conditions 

For the duration of this section, we will ignore the summation term arising from the (spring) 
constraints, since its presence will complicate the notation while making no significant contribution 
to the discussion. First, we will transform the energy inner product a(-, •) given in (3) using 
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integration by parts in two dimensions; i.e. Green’s theorem. Let n be the outward normal vector 
to Oft, t be the usual tangent vector along dVL, and ^ and A denote partial differentiation 
along the normal and tangent respectively. Assuming that u is fourth-order differentiable, Green’s 
identity (see, e.g., [Ciarlet, 1978, pg. 14], [Rektorys, 1969]) 



v Au dx dy 


-L 


dv 
u — 
a Ci dn 


du 

v— da 

on 


may be used to transform the term AuAv arising from the mean curvature cf the surface: 


/ AuAvdxdy = / / vA 2 udxdy+ / Au~ ds — [ v—-'Au)ds, 

J J Jn Jail on Jan. dn' 1 


(5) 


where A u A A u u xxxx -j- 2u xxyy -(- Uy yyy - On the other hand, the Gaussian curvature term 

can be transformed using the identity [Ciarlet, 1978, pg. 15] 


J u xx v yy -f u yy v xx — 2 u xy v xy dx dy — J 


d 2 u dv 
an dt 2 dn 


ds 


f 

Jan dndt dt 


dv 


ds. 


( 6 ) 


If the boundary is sufficiently smooth, it can be shown (see [Rektorys, 1980, pp. 268-269]) that 
the second boundary integral can be written as 


/ 

J ai 


j d f d ’* u \ 

J a 


'an ds \ dndt J 


v ds. 


’an dndt dt 

Substituting equations (5)-(7) into (3) (and ignoring the constraint term), we obtain 

a{u,v)= / / vA ? udxdy+ / P(u)vds-f / M(u)—da, 

J J n Jan Jan K J dn 


where 


P(u) 


M(u) = Au — (1 — cr) 


& ( A \ N d ( d 2 u \ 

(A„)+ (!-.)-(_ j ; 

d 2 u 


dn 


dt 2 ' 


(7) 


Thus, the necessary condition for the minimum (2) becomes 

I I (A 2 u-g)vdxdy+ I [P{u) - p{s)]v ds -[M[u) - m(s)] — ds = 0 
J Jdn Jan dn 

Now, since the above equation must hold and since v and are arbitrary on the closed 
region Cl, we must have 


A 2 u — g in Cl. (g) 

This is the fourth-order, linear, self-adjoint Euler-Lagrange equation that governs the small 
deflection of a thin plate at equilibrium, and it is satisfied by u inside R regardless of the 
boundary conditions on dCl. In its homogeneous form A 2 u = 0, it is called the biharmonic 
equation. Furthermore, u must satisfy the natural boundary conditions 
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P{u) = p(s) and M(u) = m(s) on dCl. ( 9 ) 

According to ( 6 ), the integral over Cl of the Gaussian curvature approximation (v xx v vu — 2 w 2 ) 
has no effect whatsoever on the Euler-Lagrange equation, but contributes only to the boundary 
conditions . 2 This reflects the fact that the Gaussian curvature of a surface patch is invariant as the 
surface is subjected to arbitrary bending, and that its average value over the patch depends only 
on the tangent planes of the surface around the periphery of the patch [Hilbert and Cohn-Vossen, 
1952, pp. 193-204]. This invariance property of the Gaussian curvature renders it inappropriate as 
a measure of surface consistency. For example, it cannot distinguish two developable surfaces such 
as a wildly undulating sinusoidal surface and a planar surface, both of which are cylinders and 
therefore have zero Gaussian curvature everywhere. On the other hand, the mean curvature does 
not in general remain invariant under bending and therefore plays a vital role in our energy inner 
product. This is evident from equation ( 1 ) — no value of a can make (An) 2 , which approximates 
the mean curvature, vanish . 3 

Another consequence of the necessary condition for the minimum is that the form of the 
natural boundary conditions satisfied by u are determined by any essential boundary conditions 
which may be imposed on v. In general, we can impose up to two essential boundary conditions, 
one on v and the other on First, consider the case of a simply supported plate where the 
essential boundary condition v = 0 is imposed on dCl but is left unconstrained. The solution 
u must then still satisfy the second condition in (9). We therefore have the Neumann boundary 
conditions 


u = 0, M(u) = m(s) on dtt, 
and, moreover, the first contour integral in ( 1 ) vanishes. 

Next, suppose that we also set §% — 0 on dCl. Then, 


du 

u = —— = 0 on dfl, 
on 


( 10 ) 


which are the Dirichlet boundary conditions for the clamped plate. In this case, both contour 
integrals in ( 1 ) vanish and, moreover, o is arbitrary since it does not appear in the Euler-Lagrange 
equation ( 8 ), but only in the natural boundary conditions ( 9 ) which have now been replaced 

by (10). We can therefore greatly simplify the variational integral. In particular, the functional 
minimization problems involving 


Expressions possessing this property are called divergence expressions [Courant and Hilbert, 1953]. 

Brady and Horn [1981, pg. 29] state that “the choice of which performance index to use is reduced 
to the square Laplacian, the quadratic variation, and linear combinations of them”. We stress that one 
should be careful not to choose that linear, combination which results in a divergence expression (the 
Gaussian curvature) and therefore hr,- an identically zero Euler-Lagrange equation. Recall from equation 
(2.1) that the small-deflection theory of the thin plate allows only a convex combination so, fortunately, 
it is free from danger, in this respect. : 
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£( v ) = J J n \(^ v ) 2 ~ 9 vd x d y, for a — 1, 

and 

£(v) = J -(v 2 xx 4- 2v\ y -f v 2 yy ) — gv dx dy, for a = 0, 
are equivalent in the Dirichlet case. 

Finally, consider the case of a free boundary; that is, when the externally imposed force p(s) 

and moment m(s) on the boundary are zero. Then, there are no constraints on v but, according 
to (9), u must satisfy 

P(u) = M(u) = 0 on dfl. 

These are the natural boundary cond tions satisfied by the solution for the case of the free plate. 
Once again, the contour integrals in (1) vanish and the energy functional takes the simple form 

<f(i>) = J ~(Au) 2 — (1 — a)(v xx v yy — v 2 xy ) — gv dx dy. 

In general, the admissible space V is the subspace of the Sobolev space X‘ 2 (fl) which satisfies 
the essential boundary conditions imposed on the plate. If, for example, a portion of the edge 
of the plate is simply supported, V will consist of functions which satisfy the essential condition 
v 0 on that portion of dfl. If part of the edge of the plate is clamped, then v = §% — 0 on 
that part of dfl. On the other hand, if part of the edge is free, then no constraints are imposed 
on v over that portion of the boundary and, in the case of the free plate, V = X 2 (Q). Of course, 
the plate cannot be “too free” on H, because then the physical system cannot achieve stable 
equilibrium and a unique solution would not exist. Precisely how much freedom can be allowed 
will be established formally in the next section. 

3.3. When is the Problem Well-Posed? 

Turning to our visual surface approximation problem, we should at this point choose the 
appropriate form of boundary conditions on dfl. Since the only information about the surface 
that is provided by the stereo module, for example, is embodied in the sparse constraints, the 
strategy of least commitment is to “assume nothing” about the boundaries of the surface. In 
terms of our plate problem, this means that we should impose no essential boundary conditions 
on the plate; that is, we solve the free plate problem whose admissible space V = 

If the boundary of the plate is free, it is clear that the constraints will play a critical role in 
providing a unique state of stable equilibrium. Our goal in this section is to specify the existence 
and uniqueness requirements mathematically as conditions under which the surface approximation 
problem is well-posed. To do this, we will invoke Theorem A.l, and satisfy its conditions by 
proving the following two propositions. 
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Proposition 1. The energy inner product a(-, •) i 3 symmetric. 

Proof. a(tt, u) — a(u, u ) is evident by inspection of equation (3). | 

Proposition 2. If the set of constraints C contains (at least) three noncollinear points, then a(-, •) 
is ^-elliptic for 0 < a < 1. 

Proof. We want to show that there exists an a > 0 such that a(v, v ) > aj|u|| 2 , for all v £V. 
To do so, it is sufficient to show that a(v, v) = 0 only if v == 0. We rewrite c(v, u) as 

V ) = j j n <r ( At ') 2 + (1 — <')(*'»» + 2v t, + dxdy+(j ) 2 - 

(x<,y.)ec 

Now, Au — 0 only if v is a harmonic function, while (v 2 xx -f 2u 2 y -f u 2 J = 0 only if v is a first 
degree polynomial (as can easily be shown by integration), which is a subclass of the harmonic 
functions. Thus, the integral is > 0 for 0 < a < 1 and it is zero only if v is a linear function 
over fl. On the other hand, since (3 is positive by definition the sum is also > 0 and it is zero 
only if v(x l ,y 1 ) = 0 for all (x t , y t ) £ C. Therefore, if C contains three noncollinear points, then 
a(v, v) = 0 only if v = 0, implying that a(-, •) is V-elliptic. | 

By Propositions 1 and 2 and Theorem A.l, we are assured that the continuous approximation 
problem is well-posed if 0 < o < 1 and the set of constraints includes at least three noncollinear 
points. The condition on the constraints is not unexpected in view of the arguments made in 
Section 2.1. Physically speaking, all unconstrained degrees of freedom of the plate must be 
precluded, and three noncollinear constraints is clearly the minimum requirement for this to be the 
case. In application to natural images, the stereo algorithm will almost always generate at least 
three noncollinear points, so we can, for all practical purposes, consider our surface approximation 
problem to be well-posed so long as 0 < a < 1. 

4. OBTAINING THE DISCRETE PROBLEM 

So far, we have been dealing with the continuous form of our surface approximation problem. 
We formulated it in the required abstract form, selected appropriate boundary conditions, and 
showed that it is well-posed in practice. In this chapter, we face the task of applying the finite 
element method to transform the variational principle into an appropriate discrete problem whose 
discrete solution can be computed fairly easily. Our piecewise continuous representation of surfaces 
will be based on a very simple finite element which is, however, nonconforming. This will force 
us to introduce an approximate variational principle and to show that it has a unique solution 
which converges to the exact solution as the elements shrink in size. Only then can we undertake 

the next step which is to derive the discrete problem explicitly as a linear system of algebraic 
equations. 
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4.1. Conforming vs Nonconforming Methods 

Our well-posed variational principle satisfies all the necessary conditions to guarantee that 
any conforming finite element method applied to it will converge. In principle, it is straightforward 
to apply a conforming finite element method according to the steps in Appendix A. We generate 
a finite element space S h which is a subspace of our admissible space V, and apply the Ritz 
method to find that function u h £ S h which optimally approximates the exact solution u £ V. 
The approximation is optimal in the sense that it is closest to u with respect to the strain 
energy norm a(-, •) = , or equivalently, that the strain energy in the error a{u — u h , u — u h ) is 
minimal. To construct a conforming finite element subspace, we must satisfy the completeness and 
conformity conditions given in Section A.4. Since the energy inner product a {•, •) contains partial 
derivatives of order m — 2, the completeness condition requires that the local polynomial defined 
within each element subdomain E must be at least a quadratic; P E Z) IT 2 (£') for all E £ T h 
(H 71 ^) denotes the set of n th degree polynomials over E). On the other hand, the conformity 
condition states that the polynomials must be of class C 1 across inter-element boundaries, and 
consequently S h C C’(n) globally. In satisfying both conditions, we are guaranteed that the finite 
element space is a subspace of the admissible space V, and that there exists a unique optimal 
approximation u h £ S h . 

If fl is a polygonal region, elements with straight sides will suffice. A number of such elements 
which are conforming for m = 2 (i.e., problems characterized by fourth-order Euler-Lagrange 
equations) are available. Examples are the Argyris triangle, Bell triangle, and Bogner-Fox-Schmidt 
rectangle (see, e.g., [Ciariet, 1978], [Strang and Fix, 1973], [Zienkiewicz, 1977] and the 
references therein). Unfortunately, we can expect serious computational difficulties to arise in the 
implementation of these conforming methods. The basic source of difficulty is the requirement 
of continuity of first partial derivatives across inter-element boundaries —- either the structure 
of the conforming element spaces P E becomes complicated, or their dimension is large. For our 
problem, the simplest conforming polynomial element is the Bell triangle, in which we have a 
quintic polynomial uniquely determined by 18 nodal variables consisting of the approximation v\ 
as well as its first and second partial derivatives at the three vertices. 

As is described in Appendix A, the dimensions of the finite element space can be reduced by 
the use of nonconforming elements. A popular nonconforming element for fourth-order problems 
is Admi’s rectangle, whose local function p E is a 12 degree-of-freedom polynomial with nodal 
variables being the approximating function, as well as its first partial derivatives at the four 
vertices. The element is nonconforming since it is only of class C° across inter-element boundaries. 
Many other nonconforming elements have been developed for fourth-order problems (see, e.g., 
[Ciarlet, 1978], [Strang and Fix, 1973], [Zienkiewicz, 1977]). 

kor this initial implementation, we have chosen to reduce the dimensions of the finite element 
space as much as possible by defining what for our problem is probably the simplest successful 
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nonconforming element imaginable. This element will be defined next. 

4.2. A Simple Nonconforming Element 

We will define a finite element space by the standard procedure outlined in Section A.4. 
Suppose that H is rectangular, and consider a uniform triangulation T h of H into identical square 
elements E, where the fundamental length h is the length of a side of E. By definition, we require 
that U£eT h & — ft and that the elements be adjacent and overlap along their sides. A point in 
fl is a node of the triangulation if it is a vertex of an elemental square and, as usual, we consider 
the elements to be inter-connected at, the nodes. The nodal variables, will simply be the node 
displacements ; i.e., the values of the function v h £ S h at the nodes. 

The next step is to define a space P E of polynomials p E over the element domain. The 
polynomials must satisfy the completeness condition which states that n 2 C P E , since the energy 
inner product contains derivatives of order m = 2. This is the requirement that the polynomials 
be able to reproduce exactly all states of constant strain which, in this case, are all polynomials 
up to degree two. We will satisfy this requirement by choosing P K to be the six-dimensional 
space of full second degree polynomials p E : E i-+ 5ft such that 

P E { X i y) = az 2 -j- hy 2 -f- cxy -j- dx -f- ey -(- /, (1) 

where the six rear parameters a to f are to be determined. 

We must ensure that p E is uniquely specified within E in terms of the node displacements. 
To do so, we isolate a representative element and set the ori b in of the X-Y coordinate system 
at its lower left hand corner, as illustrated in Figure 4. Our task is to choose a p E -unisolvent 
set of nodes, the displacements at which uniquely determine p E . An appropriate choice is 
the six nodes shown in the figure, whose node displacements are denoted by v, j £ 9?, for 
{iyj) ^ {( 1, 0), (0,0), (1,0), (0, 1), (0,1), (1,1)}. Expressing the six unknown parameters in terms 

of the node displacements is then a simple matter of substituting the displacements into (1) and 
solving the resulting nonsingular system of six equations. We obtain 

a — 2/~2( Vl *° ~~ 2y o,o H~ v_i (0 ), 

, 1 , 

° ~ 2 h 2 V °’ 1 ~ 2v °’° v °’— 

1 , 

v 0 ,l — Vi, 0 + V 0t o), ^ 

d=2A ( v.,°-y- 1 .o), 

C= 2 
/ = v 0 ,o- 

Of course, the six degrees of freedom of this element are insufficient to enforce C 1 continuity 
of v h across inter-element boundaries. Therefore, the element is nonconforming; S h C^H). It 
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Figure 4. Unisolvcnfc nodes for the nonconforming clement. 




'• 1,0 


Y 




is a simple matter to show that the polynomials p E are in general discontinuous across element 
boundaries, although continuity is maintained at the nodes themselves because each polynomial 
mterpolates its umsolvent set of nodal displacements. At this point, we acknowledge that our 
element is somewhat unorthodox in that the definition of p E requires nodal variables associated 
with two nodes which lie outside the element domain E. The justification for this transgression 
is that our element, as defined, will yield a discrete system whose matrix is particularly simple 
and uniform in structure. This will simplify the eventual implementation considerably. On the 
other hand, alternate arrangements for the unisolvent set of nodes are clearly possible. Perhaps 
a more appropriate choice from a biological standpoint would be a hexagonal triangulation with 
the unisolvent set of nodes placed at the vertices of hexagonal element domains E having sides 
of common length h. Regardless of our particular choice, the quadratic elements must first be 
shown to be convergent since they will invariably be nonconforming. This we will do in the next 
section for the simple square element. 
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4.3. The Approximate Variational Principle 

Due to the nonconformity of the elements, S h £ # 2 (n) = V, and the finite element space 
is not a subspace of the admissible space V . Therefore, in lieu of the energy inner product a(*, •) 
of equation (3.3), we are forced to substitute the approximate energy inner product a h (-, •), given 
by (A. 16), which can be written as 

a h (u h , v h ) = J2 S L Aul ' Avh -( l -^)(<< + <y«L-2n k x y iy )dxdy 
EeT h 

+ P uh ( x i>yi) vh {*i>yi) 

(xi,yi)ec 

r f (3) 

= E J j E ° + (1 - *)(.&«£ + 2« + n’; y v h yy ) dx dy 

T h 

+ /? uh { x i>yi)v h {xi>yi), 

(xi,yi)€.c 

where 0 < a < 1. The corresponding approximate variational principle and variational equation 
are given by equations (A.17) and (A.18) respectively. 

Does the approximate variational principle have a unique solution u h £ S h ? To answer this 
question, we proceed in the spirit of Section A.5 by equipping S h with a norm which we will 
employ to show that a h (-, •) is uniformly S^-elliptic. 

Proposition 3. The mapping ||v fc || fc : S h 3? defined by 




+ E ” h ( 

(xi,yi)ec 



where |e fc | 2>E = (/ ! K (v k xl f + (v k y f + ( v k y f dx dyf 
(A.3)), is a norm over S h . 


is the second-order Sobolev semi-norm (see 


Proof. ||-|| is a priori only a semi-norm over S h . Consider a v h £ S h such that |ju ;i || fe = 0. 
Then it must be the case that (i) \v h \ 2tE = 0 for all E £ T h , and that (ii) v h (x t , yi ) = 0 for 
all (x t) y t ) £ C. Because of their interpolatory nature, the local polynomials p E are continuous at 
all the nodes. Moreover, by condition (i), v h must be a first-degree polynomial inside every E. 
With a = 6 = c—0 m equation (2), it is a simple matter to show that this implies that v h is a 
continuous linear function over fi. Now, by condition (ii), v h is zero at all (x t ,y t ) £ C. Since the 
continuous problem is assumed to be well-posed, C contains at least three noncollinear points. 
Consequently v h = 0, and ||-|| h is therefore a norm. | 

Proposition 4. The approximate energy inner product a h (-, •) is uniformly S^-elliptic. 

Proof. 
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a. h (v\v h )= Y, f + (l- ^)l(vLf + 2^ y f+(v^]dxdy + 0 V v h (x ilV i? 

E£T h , 

= (!--)( E// f K‘«) 2 +«) 2 +Kv) 2 ^^+ E «*(*..n) 2 ) 

^ E y j F a ^ v ) + t 1 — ff )( l, is) dxdy + (0 ~\~ a — 1) E “*V>.!/.) S 

L’G T k 

>(1—^)11^112, for G < a < 1, /3 > 1 — cr. 

Since 1 — a is positive for 0 < <7 < 1, a ;i (-, •) is uniformly S^-elliptic. | 




Therefore, the approximate variational principle has a unique solution u* 1 G S \ Moreover, 
because the ellipticity is uniform, Strang's lemma (Theorem A.5) applies, and u h will converge 
to the exact solution u G V as h —► 0, if the approximation is consistent in the sense of equation 
(A.21). To verify'consistency, we apply the patch test, Theorem A.6. 


Proposition 5. The square, nonconforming element whose local, quadratic function is defined by 
(1) and (2) passes the patch test. 


Proof. Consider an arbitrary patch of four adjacent elements, all of which share a common node 
v *»i interna l to the patch, as shown in Figure 5. Now, suppose that we impose a constant strain 
condition on the patch; that is, suppose that we constrain the displacements at all remaining 
nodes around the periphery of the patch by assigning to them values consistent with the function 
7T2 G II , an arbitrary second-degree polynomial. Next, we solve the approximate variational 
principle (A.17) over the patch domain. This reduces to solving for the unknown displacement 
at the common unconstrained node v t>J such that it minimizes a quadratic equation. It is a 
matter of routine algebraic manipulation to show that the displacement obtained will also be 
consistent with n 2 (we omit the details). In fact, one can show that this is true for the internal, 
unconstrained nodes of an arbitrary patch of any number of elements whose boundary nodes are 
made consistent with 7r 2 . Therefore, the element passes the patch test. | 

Having proved the above propositions, we can now be secure in the fact that our approximate 
variational principle will provide unique discrete solutions which will converge to the exact solution 
of the continuous problem as the discretization is made increasingly fine. A limit to the order of 
convergence that we can expect from our approximation is given by (A. 15) — since our element 
is complete only through quadratics (k — 2), we are limited to a convergence rate of order h 2 in 
displacement (s = 0). For a more precise statement, we should take into account the consistency 
error term in equation (A.20). Nevertheless, we will bypass this complicated analysis because the 
consistency error is not expected to be large for smooth u which is normally the case when 
approximating smooth surfaces. 


4.4. The Discrete Problem 

We are finally ready to derive an explicit form of the discrete problem associated with our 
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i igure 5. Applying the patch test to four adjacent elements. 



i 


1 


approximate variational principle. There are essentially two ways of proceeding. One possibility is 
to find the Ritz basis functions <f> l which are associated with our finite element and which span 
the space S h . The basis functions are nonconforming piecewise quadratics with local support, and 
a basis function associated with each node of the triangulation. We can then use the variational 
equation (A. 18) directly, and write the discrete problem as the linear system of equations analogous 
to equation (A.13) by computing the matrix coefficients a h {^ it <f>j). Unfortunately, the piecewise 
continuous nature of the basis functions makes them tedious to manipulate, especially near 
the boundary. We will adopt an alternate approach which altogether avoids the derivation and 
manipulation of the basis functions. The approach is to solve the approximate variational principle 
by minimizing the functional £/,(•) of equation (A.17). Before doing so, however, we make two 
additional simplifications. 

The first simplification involves taking a conservative stance once more. There is no reason 
to believe that the human visual system is biased in the depth values it assigns such as, for 
example, making all of them too small or too large. That is to say, we have no reason to assume 
that there is an external influence on the surface other than that provided by the constraints C, 
and we should therefore nullify the externally-applied surface force: g(x, y) = 0. The linear form 
(3.4) for the free plate then reduces to 


f(v h )-0 53 

(Xi,yi)ec v * / 


( 4 ) 
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The second simplification involves the choice of a numerical value for the constant a in our 


approximate energy inner product a h {-, •) given by (3). According to the proof of Proposition 4, 
we are at liberty to choose any value in the range 0 < a < 1, therefore, the simplifying choice 
° ~ 0 will be made. 4 Setting a — 0 in (3), the energy inner product becomes 


a h {u h t v h )- f f E u ix v ix + 2u% y vi y +u* y v* y dxdy + (3 u h {xi,yi)v h {xi,yi). (5) 

E ^ h ( Xi,yi)£C 

Thus, according to equation (A.17), Ave obtain the simplified energy functional 

-5 £ j j E ^? + W + «f^y + { E [’'*(*■>»■) — c (x„yi)\ 2 - (6) 


Eer h 


(^i.y.)ec 


The expression inside the integral will be recognized as the "quadratic variation” expression used 
by Grim son [1981a]. 

Since the triangulation over the rectangular region H is that of a uniform square grid, it is 
convenient to impose on the nodes the natural lexicographic indexing scheme implied by Figure 
4. We index the nodes by (i, j) for » = 1 and j = 1,..., N y where N x and N y are the 

number of nodes along the x and y axis respectively. The total number of nodes is N = N x X N y . 

The displacement at node ( i,j ) is denoted by the variable vj^, and all the displacements together 
are denoted by the vector \ h £ . 


The next step is to express the functional in terms of the node displacements with the help 
of our element. Inside each element domain E, v h is a quadratic polynomial given by (1) and (2). 
Therefore, the second partial derivatives of p E are constant within E, and are given by 



E 

xx 

E 

yy 

E 

xy 


2 a — —— (v^i , — 2v^ . 4- v^ V 

/j2V UJ ' V t —1 ,])> 

1 


= 2b = — (y h , — 2v h .4- v h V 

^2 V *iJ+l ' V t,j—1 )> 

— c — — (v h — M h — v h 

/ l 2^ V ‘+l.i+i v t+l,> 



where it is assumed that i,j is the index of the lower left hand node of E. The form of these 
second derivatives will be recognized as being simply the finite difference approximations of order 
h 2 for the respective derivatives on a uniform, square mesh (see, e.g., [Abramowitz and Stegun, 
1965, pg. 884]). Of course, this result is a consequence of our particular choice of finite element, 
and it will lead to a particularly simple discrete problem. With other elements one cannot expect 
to obtain finite difference expressions, even for uniform triangulations. Substituting the expressions 


Recall that a is the Poisson constant of the elastic material, so our choice implies that the material 
does not change in width as it stretches lengthwise. Although this value is not meaningful physically, it 
is perfectly acceptable mathematically. Aside from a question of convenience, there is further evidence 
that supports this choice in terms of the optical laws of image formation (see [Grimson, 1982b]). 
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for the derivatives into v fl ) in (6) and noting that the area of each element is h? } we 

obtain 5 


= | e/ + W + (pfo 2 ** +1 E W, - eb) 


egt'* 


fc 2 


2 
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-t EK^+^+^j + f E 
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2h 2 ^ 2v »d + v ?— i,j) + 2 ( v H-i,>+i ~ y i,j+i ~ 4" v t ^) 2 

EeT h 

+ ( T Ui - ^ +f E ( v t - <E) 2 - 

We can write the above expression for the functional (aside from the additive constant term) 
in matrix form as 


&("*) = i(v\ AV) - (|\ y»), 


P) 


where (•, X 3? A ' 3? denotes the familiar Euclidean inner product, and A* £ 9t NN is a 

matrix of coefficients. Clearly, equation (7) is the discrete equivalent of the functional (6). For 
the linear term, we have f^ — (dc h where c h £ 3?^ is a vector whose entries associated with 
constrained displacements are the constraint values c^ and the rest are zero. On the other hand, 
the matrix A £ which forms the quadratic term, is a matrix of coefficients which can 

be broken down as the sum of two matrices: A h = A£ -f B h . The matrix is a diagonal 
matrix whose diagonal entries associated with constrained displacements are equal to f3, and the 
remainder are zero. As is clear from equation (A.13), the entries of the other component Aj can 
be interpreted as inner products between pairs of basis functions of the finite element space S h . 


Since the basis functions have local support, most of these inner products will be zero thereby 
making A h sparse and banded. Moreover, since by Propositions 1 and 4, the energy inner product 
is symmetric and S^-elliptic, A h is a positive definite, symmetric matrix. These are important 
properties from a computational point of view. 


To obtain the minimum of £h{v h ) we set to zero its partial derivatives with respect to each 
of the displacements vj 1 ^ . The minimizing vector of displacements u h satisfies the condition 

V4(u' l ) = AV l -f fe =0, 

(where V is the gradient operator) which yields a discrete problem in the form of a system of 
linear equations: 


A h u h = f\ ( 8 ) 

where the entries of A h are given by 

5 We also assume for simplicity that all constraints c l9itVi) coincide with nodes (i,j) in T h . Hence, we 
will denote the constraints by cf 
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Figure 6. Computational molecules associated with the discrete problem. 





From this expression, A h will be recognized as the Hessian matrix of the functional £ h (see, e.g., 
[Luenberger, 1973]). 

Although the evaluation of the Hessian matrix entries is routine lor interior nodes, it is 
tedious due to the special cases for the elements around the boundaries of the region Cl. We 
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omit the details and give the final result in terms of a set of computational molecules which are 
illustrated in Figure 6 in relation to the lower left hand edge of 0 whose boundary is indicated by 
bold links. Obviously, computational molecules for the remaining edges are appropriate rotations 
of those shown. The particular computational molecule associated with a node specifies the 
(nonzero) coefficients of the equation for that node. For example, the equation for the displacement 
at a node ( i,j ) in the interior of the region (indicated by the double circle in the topmost 
computational molecule in Figure 6) is 


8 


h 2( y i-i,j + v t+i,j + Y i,j —i + v !j+i) 
2 


+ fcaW-U-i + v H-U-i + + v 

1 


l + l >7 + 1. 


2,j ~f V t+2 ,j 4" V i,7—2 + v t,j-f2) 

20 u 

^ V, .7 ~ ^ C ‘r7- 


( 9 ) 


The terms involving (3 are present only if there is a constraint c iyj at that node. 

The sparseness of A h is evident from the above equation — matrix rows associated with 
interior nodes have only 13 nonzero entries, while rows associated' with nodes near the boundaries 
have even fewer. Also, note that the computational molecule for the center of the region is a 
factor of h 2 (due to the elemental area) times the finite difference approximation of order h 2 
for the biharmonic operator [Abramowitz and Stegun, 1965, pg. 885] that is associated with the 
Euler-Lagrange equation for our variational principle. This is an expected consequence of our 
particular choice of element which yielded finite difference approximations for the second partial 
derivatives of v h . Moreover, aside from multiplicative constants, the same molecules were obtained 
by Grimson [1981a] in the specification of a (conjugate gradient) mathematical programming 
algorithm. As was previously argued however, the finite element method is richer in that it 
systematically suggests many alternative, less-restrictive triangulations, as well as more general 
local representations for surfaces. 


5. MULTI-LEVEL SURFACE RECONSTRUCTION 

As we have seen, the application of the finite element method to a well-posed quadratic 
variational principle, such as the one on which our surface approximation problem is based, 
inevitably leads to an equivalent discrete problem which takes the form of a linear system of 
algebraic equations. The matrix of coefficients of this nonsingular system is symmetric, positive 
definite, sparse, and banded. Computing the most consistent approximating surface now amounts 
to solving this system and, in this chapter, we adopt an efficient hierarchical algorithm to perform 
this task. We will proceed to develop the algorithm and to demonstrate its performance it on 
constraints from stereopsis. 
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5.1. Possible Methodologies for Solving the Discrete Problem 

The solution of linear systems is a very important problem in numerical analysis and the 
many techniques which have been developed fall into essentially two broad classes — direct 
methods which yield the solution in a tinite number of steps, and iterative methods which typically 

converge asymptotically to the solution (see, e.g. [Dahlquist and Bjorck, 1974] or [Gladwell and 
Wait, 1979]). 

Direct methods include matrix inversion methods such as Gaussian elimination and LU 
decomposition. Although widely used for solving finite element equations, they usually do not 
exploit the sparseness and bandedness of the system matrix because, during the inversion process, 
the sparse matrix is transformed into a full one. 6 Consequently, all the elements of the matrix 
must be stored. Moreover, direct methods are typically global and sequential algorithms, which 
makes them unsuitable models for neurally-based visual processes. 

On the other hand, the class of iterative methods readily gives rise to biologically-feasible 
algorithms. Examples in this class are relaxation methods such as Jacobi relaxation, Gauss-Seidel 
relaxation and successive overrelaxation, as well as gradient methods such as gradient descent and 
the conjugate gradient method. Iterative methods exploit the sparseness of the matrix inasmuch 
as they do not modify its elements from one iteration to the next. Therefore, only the relatively 
few nonzero matrix elements need be stored. Owing to the sparseness and banded structure 
of the matrix, iterative methods require local-support computations only, and in certain forms 
such as Jacobi relaxation and gradient methods the computations can be performed in parallel. 
Because iterative methods in general and relaxation in particular are fundamental to the ensuing 
discussion, an introduction to some of the relevant mathematics of this class of techniques is 
included in Appendix B for convenience. 

The algorithms we are contemplating are to be executed by computational mechanisms in 
the form of networks of many simple processors, such as neurons, which are directly connected 
only to near neighbors. Due to the myopic nature of the processors, global interactions can take 
place only indirectly, through the iterative process, by an incremental propagation of information. 
Normally, the network is large and since this is reflected in the size of the linear system, we 
anticipate that a vast number of iterations will be required for any relaxation or gradient method 
to converge. Typically, the number of iterations will be on the order of N m , where N is the 
dimension of the matrix, and m is the highest order of partial derivatives present in the energy 
inner product, which in our case is two. Grimson’s [1981a] formulation of surface interpolation as 
a problem in mathematical programming naturally led him to the choice of a gradient method 
for its solution and, not unexpectedly, disappointingly slow convergence rates were observed due 
to the large size of the images typically encountered. 

b F° r a Positive definite symmetric matrix, the inverse matrix remains banded, but is no longer sparse 

withm the band. The inverse matrix is the discrete Green’s function for our problem, which in general 
has global support over 0. 
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Recently, a class of iterative techniques called multi-grid methods have seen increased 
application to the numerical solution of boundary value problems for which they achieve 
convergence in essentially order N number of operations [Brandt, 1977a, 1977b; Brandt and Dinar, 
1979]. This spectacular improvement results from the use of a hierarchy of grids to increase the 
efficiency of the global propagation of information. Multi-grid algorithms feature both intra-grid 
and inter-grid processes. Typically, the intra-grid processes are relaxation iterations, while the 
inter-grid processes are local polynomial interpolations. Therefore the multi-grid algorithms are, 
in principle, biologically feasible. A final issue which speaks in favor of adopting them to vision 
is the intrinsic multi-level structure of the earliest stages of the visual system itself and, as we 
argued in the introduction, the apparent need to maintain this structure at least to the level of 
the 2£-D sketch. 

We therefore advocate a hierarchical approach to surface reconstruction, which we will 
develop initially in the context of the Marr-Poggio stereo theory whose clear multi-level structure 
provides ample motivation. At the heart of the proposed scheme lies a multi-grid algorithm 
adapted to the fast solution of a hierarchy of discrete thin plate surface approximation problems. 
In the following sections, we present the underlying theory and build up a detailed description of 
the multi-level algorithm. 

5.2. The Multi-Level Equations 

As we have stated, the stereo module generates sparse depth information over a range of 
resolutions. The information at any particular scale can be thought of as a set of constraints 
which, at that level, define a well-posed, discrete surface approximation problem. It is natural 
then to view our surface reconstruction problem as the solution of a hierarchy of such discrete 
problems. The discretizations are performed in the usual way by introducing a sequence of finite 
element spaces S hl S hL over the rectangular domain H, where L is the number of levels 
and hi > • • • > h L are the fundamental lengths of the elements at each level. In the familiar 
notation, we will denote the functions which are members of the finite element spaces by (italic 
face) v hk E S hk , and the parameters (i.e., the nodal displacements) which define these functions 
according to (A.ll) by (bold face) vectors v hk E where N hk is the dimension of S hk . The 

hierarchy of problems is then given by the sequence of L linear systems (see equation (4.8)) of 
the form 

A hk u hk — f hk , 1 < k < L, (1) 

whose discrete solutions u hk E k for 1 < & < L define a sequence of functions u hk E S k which 
constitute the hierarchy of full surface representations. 

Although, in theory, there need be no restriction in the relationship of element sizes from 
one level to the next, a number of practical implementation-related issues point towards the 
subdivision of each square element domain on a given level S hk into four identical element domains 
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on the next finer level S ,lk+i ; that is, we choose h^ . Consequently, S hk will be a 

subspace of S hk +and the implementation of the inter-level processes is simplified substantially. 
Moreover, the 2:1 ratio is a natural one in view of the spatial-frequency bandpass channels in early 
vision whose center frequencies are spaced approximately on <5 octave, apart, the spatial'resolution 
of a channel being about twice that of the immediately coarser one [Wilson and Giese, 1979]. 
Finally, the choice can be shown to be near optimal in terms of the multi-grid convergence rate 
[Brandt, 1977a, pg. 353]. Since the triangulation of fi associated with our simple elements is a 
uniform grid of square element domains, the 2:1 ratio implies that in scanning along the x or 
y directions, every second node of a grid coincides with a node of the next coarser grid and, 
furthermore, that the number of nodes is related from one level to the next by N hk —i —- ^N hk 
Therefore, the total amount of space required to maintain all of the representations is bounded 

by N hL ( 1 + i 4; A H-) = iN hL ; i.e., it is only a small fraction more than that required for 

the finest grid. 


One can think of several possibilities for exploiting the hierarchy of discrete problems to 
increase the convergence rate of the iterative process. Perhaps the first idea that comes to mind 
is to solve the system at the coarsest level, which can be done vory quickly, and use the solution 
as an initial approximation in the iterative solution of the next finer level, proceeding in this 
manner to the finest level. This is an effective acceleration strategy which is almost as old as the 
idea of relaxation itself [Southwell, 1946]. Although it is suitable for obtaining a single accurate 
solution at the finest level, it cannot generate solutions having the finest-level accuracy over the 
hierarchy of coarser levels, since the approximation error increases as the elements become larger. 
This is undesirable from the point of view of our surface reconstruction problem. Here we require 
that the accuracy of the finest-resolution surface be maintained throughout the coarser surface 
descriptions. This will guarantee that the shape of the surface will be consistent over the hierarchy 
of representations. 

The stipulation that accuracy be maintained is further motivated by psychophysical studies 
into the phenomenon of visual hyperacuity (see, e.g., [Westheimer, 1977; Westheimer and McKee, 
1975, 1977]). Related computational studies indicate that, in principle, sharp, well-defined intensity 
edges can be localized to high (sub-receptor separation) accuracies from the V 2 G convolution 
values through a process of spatiotemporal interpolation [Barlow, 1979; Marr et al, 1980; Hildreth, 
1980; Crick et al., 1981; Fahle and Poggio, 1981]. Consequently, it seems that although the depth 
constraints arising from the larger channels in stereopsis represent coarser spatial samplings of the 
scene, excluding erroneous matches, the samples may provide highly accurate range information. 

The only way that consistent accuracy can be maintained throughout the hierarchy of full 
surface representations is to allow the coarser levels access to the high-resolution information in 
the finer levels. The multi-grid algorithm provides such a flow. The hierareny of levels cooperate, 
through a bi-directional flow of information, to simultaneously generate multiple, equally-accurate 
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surface representations, and do so with much less computational effort than would be expended 
in solving the finest-level system in isolation. To see how this is accomplished, we will initially 
consider only two levels, a fine one and a coarse one, associated with the finite element spaces S h 
and S respectively. Suppose that by some iterative process we obtain an approximate solution 
\ 2h to the coarse level system A 2/, u 2/l = f 2 \ which is then interpolated 7 to the fine level where 
it becomes the initial approximation v^: 


y h <- I v 2 \ 
2h=*h 


( 2 ) 


The mapping l 2 h=*h • S 2h S h denotes interpolation from the coarse space to the fine space 
Normally, y 1 ' will require substantial improvement. 


Let u h be the solution to the fine level system A h u h = f\ Then we can define the error 
vector in a given approximation y h by c h = u h — y h . Clearly, if e h could be computed, it could be 
added as a correction to v h , thereby giving us the desired solution. But because the computation 
of c h would take about as much effort as computing u ;i itself, doing so would not be helpful. 
On the other hand, if we could somehow approximate the error function e h by a function e 2h in 
the coarse space S 2h , such an approximation can be obtained quickly due to the fact that the 
coarse space has only one quarter the dimensionality of the fine space. Such an approximation is 
generally not possible, however, because e h , having been generated by an interpolation from the 
coarse grid solution, is certain to have large fluctuations with wavelengths less than Ah. These 
high-frequency Fourier components could not be approximated on the coarse grid because there 
they would alias as lower-frequency components. Before a meaningful approximation to the error 
can be obtained on the coarse grid, the high-frequency components must be eliminated; that is 
to say, the error function e h must be smoothed. 


Since smoothing is inherently a local operation, it should not be surprising that local iterative 
methods, inefficient as they are in obtaining solutions, are very efficient at smoothing the error 
function. In particular, although relaxation generally requires very many iterations to eliminate 
the global, low-frequency components of the error, it only takes a few iterations to eliminate 
the local, high-frequency components. This behavior can be predicted mathematically by a local 
Fourier analysis of the given iterative method [Brandt, 1977a]. The analysis involves a local Fourier 
expansion of the error function followed by an examination of the effect that a single iteration 
has on the amplitudes of each component. An important quantity which is obtained through this 
analysis is the smoothing factor Ji of the iterative scheme, which is defined as the worst (i.e., 
the largest) amplification of a high-frequency component of the error from one iteration to the 
next. As an example, in Appendix C we carry out a local Fourier analysis of the appropriate 
Gauss-Seidel scheme for our discrete problem, and show that jS « 0.8. This implies that, for our 
problem, ten Gauss-Seidel iterations on the fine grid are sufficient to reduce the high-frequency 

7 Lagrange interpolation of a suitable order may be used. 
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components of e h by approximately an order of magnitude. A more effective weighted Jacobi 
relaxation scheme, which is also suitable for our problem and gives a ju — 0.549, is described in 
[Brandt, 1977a, pg. 342]. 8 

Once the error has been smoothed, it may be inexpensively approximated on the coarse 
grid. The equation for on the fine grid is the residual equation 

A h e h — r h , where = f* — A h \ h ( 3 ) 

is called the residual of the approximation v\ The approximation to this equation on the coarser 
grid is 

A 2h e 2h = I r\ 

h^*2h 

where the mapping l h= > 2h :S k S 2h is an “interpolation” from the fine space to the coarse 
space. Because S 2h C S h , the mapping can be a simple injection or some local averaging of node 
displacements from the fine grid to the coarse. After e 2h is computed, a better approximation to 
\ h may be obtained by interpolating the coarse grid correction back to the fine grid; that is, by 
making the replacement: 

v*-<-v h + I e 2 \ 

2h=$h 

This correction practically annihilates the smooth part of the error e h . 

Brandt [1977a, 1977b] calls the foregoing scheme a correction scheme in view of the fact 
that the function computed on the coarse grid is the error function; that is, the correction to 
the fine grid approximation. The correction scheme is easy to implement, but it is unsuitable for 
our surface reconstruction problem because instead of an error function e 2h , we require that the 
function computed in the coarse space be a function u 2h which represents explicitly the distances 
to surfaces in the scene. This may be accomplished by a reformulation of the correction scheme 
equations which converts them into those of the related full approximation scheme. 

First, we rewrite (3) in the equivalent form 

A V 1 + e") - A == r\ 

which may be approximated by the corresponding coarse-grid equation 

A 2h { I y' 1 + e 2h ) - A 2h ( I \ h ) = I r \ 

h=*2h h=*2h h=$2h 

Defining a new function u 2h in S 2h by the nodal displacement vector u 2 ^ 1 = lk=* 2 h'V fl -f- e 2 ^, we 
obtain the coarse level system 

8 Brandt proposes this scheme for -olving biharmonic boundary value problems. The scheme is also 
appropriate for our surface approximation problem which is in essence a biharmonic problem in view of 
the associated Euler-Lagrange equation. 
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A 2h u 2h = g 2h , where g 2h = A 2h ( I v /l ) -f I r h . m 

h=*2h h=*2h K ' 

It is natural to interpret (4) as the original coarse-level system A 2h u 2h — f 2h but with a right-hand 
side which has been modified using information from the fine grid so as to maintain the fine-grid 
accuracy in the coarse-grid function u 2h . Thus, g 2h is an estimate of the local truncation error on 
the coarse level relative to the fine level (see [Brandt, 1977b, pg. 284]). 

Once the solution u 2 ^ of equation (4) is available, we can write e 2fl = u 2,t — I/i=* 2 /i v/l as the 
desired coarse-grid approximation to the fine-grid error, and the approximation on the fine level 
can be corrected by the replacement 


y h <- v h f I (u 2/l - I \ h ) 

2k=*h h=*2h 


(5) 


Note that since l 2h =*h h=*2h v h ^ v h the replacements given by (2) and (5) are not equivalent. 
Since u 2h is a low-frequency correction, the replacement indicated by (2) would destroy the 
high-frequency components of v h whereas the replacement indicated by (5) preserves them. 

How do we solve the coarse-grid equation (4)? The obvious answer is: by relaxation iterations 
on the coarse grid, and with the help of corrections obtained from still coarser grids. Thus, in a 
straightforward recursive fashion, we can extend the above two-level equations to any number of 
levels. In view of (4) and the fact that the residual for the level k equations is given by 

r hk = g hk — A hk u hk , (6) 

the multi-level equations for L levels are given by 

A hk u hk = g hk for 1 < k < L, ( 7 ) 

where 


g ht = f^; and 

g hk =A hk { I u fc * + , )+ I lg hk + l —A hk +'u /lfc + 1 ), for 0 < fc < L — 1 . ( 8 ) 


Note that the original, right-hand side f hk of the k th level problem occurs only on the finest 
level L. The right-hand sides of the coarser levels have been modified in order that the finest 
level accuracy be properly maintained throughout; that is to say, in order for the solutions 
to coincide: u^ 1 = I/i 2 =>h, u hz ==•••== l h2 ^ hl • • -I h L =>h L ^j u ht . Analogously to the two-grid case, 
we can interpret the difference of the original and the corrected right-hand sides, f hk — g hk , as 
an estimate of the local truncation error of level k relative to the finest level. 


5.3. Multi-Level Surface Reconstruction Algorithms 

We have motivated the multi-level approach to surface reconstruction and described in a 
quantitative manner its basic components — the intra-level relaxation processes, and the inter-level 
interpolation processes. It now remains to show how to bring the components together into an 
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algorithm for solving the multi-level equations (f) and (8). Several schemes have been proposed 
[Brandt, 1977a, 1977b; Brandt and Dinar, 1979]. We will describe one which is appropriate in terms 
of our surface approximation problem. 9 Before defining the full multi-level surface reconstruction 
algorithm, we will define its main procedure, the multi-level surface reconstruction cycle. 

The multi-level surface reconstruction cycle starts at the (currently) finest level l, making 
several cycles to the coarser levels k = <- 1 , 1-2, ..., 1 , until a hierarchy of surface representations 
which are as accurate as is possible in the S h ‘ space is obtained. Let denote a tolerance for 
solving the equations on level k. £ and r) are switching parameters which are given appropriate 
values below. 

Algorithm 1 — Multi-Level Surface Reconstruction Cycle 
Step 1 — initialize the finest level l. 

Set the right hand side of the level l problem A^u*' = g h ‘ to the original right hand 
side: g /l ' <— f h ‘. Introduce the initial approximation \ h ‘ <- Vq'. Set e t O, 10 and k <— l . 

Step 2 — start a new operation level k. 

Set <— oo. 

Step 3 — perform a relaxation iteration. 

Perform a relaxation iteration for the equation A^' fc u^* = g^ k and concur r ently compute 
some norm of the residual given by (6), <— ||r /lfc ||. 

Step 4 — test the convergence and its rate. 

If e fc < then convergence has been obtained at the current operation level; go 
to Step 6. If k — 1, go to Step 3. If < rje°^ then the convergence rate is still 
satisfactory, set e° 1( * <— ek and go to Step 3; otherwise the convergence rate is slow so 
go to Step 5. 

Step 5 — transfer to coarser level. 

Introduce as the first coarse-level approximation the function v hk ~' defined by the 
nodal displacements 


y h k-l i - J yh-k 

h k =4- h k — i 


Set the right-hand side of the coarser level problem A^ fc — = g flk ~ 1 to 

g hk ~ 1 4- \ hk ~ly hk ~l _|_ J (g/lfc _ J±h ky h k j 

hk =^hk — i 


Brandt refers to it as the accommodative, full multi-grid, full approximation scheme algorithm [Brandt and 
Dinar, 1979]. 

10 This value for a is temporary. A realistic value is introduced in Step 5. 
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(in view of equation (8)). Set the tolerance <— £e k . Concurrently with the 

computation of g k ~ 1 t compute the norm of the local truncation error — 1 _ 1 

using the same norm as in Step 3. If k = l set e t <- — g fc*— 1 1| n Finally, set 

k k — 1 and go to Step 2. 

Step 6 — use converged solution u hk to correct finer level. 

If A: < /, make the correction (in view of equation (5)) 

v hk+1 y h k +i J (u hk — I y hk + 1 ); 

h-k^hk-t -1 hk + i=*hk 

set k 4 — k -f- 1 and go to Step 2. Otherwise, if A; == / end. 


The relaxation operation in Step 3 can, in principle, be based on any one of the iterative 
methods described in Appendix B, but is usually a simple Jacobi or Gauss-Seidel iteration. For 
our surface reconstruction problem, in view of equation (4.9) and the discussion in Appendix B, 
the Jacobi relaxation iteration in the interior of H is given by 



while the Gauss-Seidel relaxation iteration is given by 


(9) 



where we have suppressed the superscripts h k indicating the level, and have instead introduced the 
bracketed superscripts which indicate the iteration number. Analogous formulas for the boundary 
points can be derived from the computational molecules associated with the boundary. The norm 
computed in Steps 3 and 5 can be the discrete L 2 or norm. In the case of Gauss-Seidel 
relaxation, it is quicker to compute the dynamic norm, as the iteration progresses, rather than 
the static norm (see, e.g. [Brandt, 1977b, pg. 286]). 


An important feature of the multi-level algorithm is that the local Fourier analysis, in 
addition to providing a prediction of the convergence rate, enables one to predict near-optimal 
values for the switching parameters. It turns out that the actual values assigned to the switching 
parameters are n ot critical, and that good values are £ = 0.2 and r, = Jj ,, where /2 is the smoothing 

"The constant l is the value of 2 (see [Brandt, 1979, pg. 65]), where p = 2 is the approximation 

order of the second partial derivatives of the energy inner product that is achieved by our quadratic 
element. 
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factor of the relaxation method used in Step 3 of the algorithm (see [Brandt, 1977b, pg. 290]). 
The order of the interpolation operators is determined by the problem itself; i.e. the order of 
derivatives in the energy inner product. For the coarse-to-fine interpolation I h k **h k+l in Step 
6, the natural second-degree interpolation of the element polynomial p E may be used. On the 
other hand, simple injections perform well for the fine-to-coarse transfers lh k =*h k —i in Step 5 and 
i =*h k in Step 6. 

Having defined the multi-level cycle which starts at the finest level l and cycles through the 
coarser levels, we will employ it as a procedure within a more general, full multi-level surface 
reconstruction algorithm. We now th nk of the level l of the cycling algorithm as the currently 
finest level ; i.e., the finest level for which an approximate solution has already been computed by 
the multi-level cycle. The full algorithm works in the opposite direction, the currently finest level 
progressing from’ the coarsest level / — 1, to the finest level l — L. 

Algorithm 2 Full, Multi-Level Surface Reconstruction Algorithm 
Step 1 — solve the coarsest-grid equation. 

Compute by relaxation an approximate solution u^’ to the coarsest-grid equation 
A fc «u fc * = f*>. Set l <- 1. 

Step 2 — set a new finest level l. 

If / = L stop. Otherwise increment the currently finest level l *- l -f 1, and set the first 
approximation on the new level to be the function uj' defined by nodal displacements 
V o' = I/M -!=*h, 

Step 3 — perform a multi-level cycle. 

Invoke Algorithm 1 and when it ends, go to Step 2. 

Note that the solution in Step 1 will be performed quickly because S h1 has relatively few 
dimensions. In Step 3, each time Algorithm 1 terminates at level l, we have obtained a hierarchy of 
/ representations whose accuracy is the best possible on level l. The currently finest approximation 
is then interpolated to the next finer level until the finest level L is reached. Brandt recommends 
a somewhat higher-order interpolation for the initial interpolation I,,,_ 3= * /l( in Step 2. Third order 
Lagrangian interpolation seems adequate for our surface interpolation problem, as we will see 
from the demonstrations in the next section. 

Algorithm 1 is accommodative in that it makes internal checks, based on the computation 
of norms, to determine when to switch levels. For many types of problems, accommodative 
algorithms behave in a fairly fixed manner, performing a similar number of iterations on each 
level before switching. It is then possible to avoid computing the dynamic-residual norm in Step 
^ Algorithm 1, and to preassign a fixed flow. A switch to the coarser level S flk ~ 1 is made after 
n c iterations have been performed at level S hk . Analogously, a switch to the finer level S hk + 1 is 
made after n f iterations have been performed on level S hk since the last return from the finer 
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level. n c depends on the smoothing factor, a good choice being n c = log .1 / log JX. Sometimes n f 
varies from level to level. For a more extensive discussion see [Brandt, 1979, pg. 68-69]. Fixed. 

algorithms are to be preferred for parallel implementations in general, and from a biological point 
of view in particular. 

In order to evaluate the performance of the multi-level surface reconstruction algorithm, we 
define a unit of computation called a work unit which is the amount of computation required to 
perform one relaxation iteration on the finest level L. It is roughly equal to wN hL , where w is 
the number of operations required to compute the residual at each node 12 and N hL is the number 
of nodes at the finest level. Since there are one quarter as many nodes on level A; — 1 as there 
are on level k, only 1/4' work unit is required to perform a relaxation iteration on levei L — ». 
The proportionate amount of computation done on coarser grids thus diminishes very rapidly. 
Although, for accommodative algorithms, it is difficult to predict the total number of operations 
consumed by the inter-level processes, it normally turns out to be considerably less than the total 
inter-level process computation, and is therefore usually ignored (see [Brandt, 1977a, section 6.2]). 

A final issue that we have not yet considered in quantitative terms is the choice of appropriate 
values for the (spring) constant (3. In the mathematical and, in particular, in the finite element 
literature, the constraint term cfr,(u) of equation ( 2 . 2 ) is known as a penalty Junction (see, e.g., 
[Courant, 1943], [Babuska, 1973], [Strang and Fix, 1973], [Zienkiewicz, 1974]). The incorporation 
of penalty functions into variational principles is a standard way of approximately satisfying 
essential boundary conditions by converting them into appropriate natural boundary conditions 
which may be handled straightforwardly by the finite element method. Penalty functions are 
particularly useful when the essential boundary conditions in question are complicated, or when 
only their approximate satisfaction is desired, as in the case of visual surface approximation. An 
optimal value for (3 can be derived through the following considerations. Let w be the solution 
to our surface approximation problem, which interpolates the known depth points. As usual, 
u £ ^ 2 (fi) denotes the exact solution to the variational principle ( 2 . 2 ), including the penalty term 
<f 5 , and u h £ S h denotes the finite element approximation to u. Then, there will be a balance 
between the error w — u which measures how closely the surface fits the constraints and the error 
u u h , due to minimizing over a finite element subspace [Strang and Fix, 1973, pp. 132-133], 
Analyzing this balance, Babuska [1973] determined that the optimal value for (3 is dependent on h 
and is given by /3 h = where 7 is a constant and k is the degree of the complete polynomial 

contained in S h . Therefore, for our quadratic finite elements, k = 2 , and the best value for /? at 
level j of the multi-level algorithm is J3y l] — 7 //i 2 . 

5.4. Examples of Multi-Level Surface Reconstruction 

Figur e 7, is a schematic diagram of the structure of the multi-level surface reconstruction 

12 w is determined by the specific relaxation scheme used, but due to the size of the support of the 
central computational molecule, it is approximately equal to 13. 
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algorithm, showing three levels of resolution. The diagram depicts the relaxation processes operating 
at each level, as well as the fine-to-coarse and coarse-to-fine processes which transfer information 
between levels. The algorithm transforms a hierarchy of sparse surface depth representations, such 
as might be provided through the independent stereo bandpass channels, into a hierarchy of full 
surface representations which constitute the full 2^-D sketch. The constraints for the surfaces 
shown in the figure are random samples from a surface which varies sinusoidally in depth. It is 
evident that the multiple full representations output by the algorithm, describe the sinusoidal 

surface over a range of scales and that the accuracy of the finest representation is maintained in 
the coarser ones. 

In this section, a number of examples of multi-level surface reconstruction are presented. 
We will consider the reconstruction of surfaces from artificially-generated constraints, as well 
as constraints generated from natural images by an implementation of the Marr-Poggio stereo 
theory. The performance of the multi-level algorithm is compared to that of single-level iterative 
algorithms. In the examples presented, the algorithm was started from identically zero initial 
approximations on all the levels. 


In the first sequence of figures, we present synthetic examples of surface reconstructions with 
the purpose of illustrating the performance of the algorithm in reconstructing quadric surfaces 
having zero, positive, and negative Gaussian curvatures. Constraints on each level were synthesized 
by sampling depth along arcs on the surface. The examples, involved four levels whose grids had 
dimensions N = N = 17, N^ == N = 33, = 65, and - N h y < = 129, 

with corresponding grid spacings h x = 0.8, h 2 = 0.4, h s = 0.2, and h 4 = 0.1. The relaxation 
method employed was the Gauss-Seidel method of equation (5.10), and a value of 2.0 was chosen 
for 7 , giving (3 h . — 2 . 0 //i?. 

Figure 8 shows depth constraints whose values are consistent with a cylindrical surface viewed 
at four resolutions. The constraints lie along arcs of greatest curvature. Figure 9 illustrates the 
hierarchy of full surface descriptions reconstructed by the four-level algorithm. Since the constraints 
on all the levels sample the same ideal cylindrical surface, the full surface representations coincide 
to a high degree of accuracy. Convergence was obtained after 12.0 work units. For comparison 
purposes, the finest level problem was isolated from the coarser levels and the same Gauss-Seidel 
relaxation algorithm was applied to it. Figure 10 shows the (single-level) approximation obtained 
after 800 work units (i.e., iterations). It is clear that we are still very far from convergence. 
Although the approximation is generally smooth, it has large low-frequency error components and 
the approximate surface lies far below its final value between constraints which are separated by 
fairly large distances. As predicted by the local Fourier analysis, it is precisely such low-frequency 
error components that local iterative algorithms have difficulty liquidating. In fact, the following 
characteristic phenomenon was observed. During the initial iterations, the corrections made to 
the approximation decreased rapidly, so that by the 800 th iteration they are minute, even though 
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Figure 


. Hierarchy of full surface descriptions generated by the multi-level algorithm. 
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Figure 10. Single level approximation after 800 work units. 



the error norm is still very large. Since there are 17,361 nodes in the grid, it may take on the 
order of (17, 361) 2 work units to obtain the solution without the help of the coarser levels. Thus, 
the multi-level algorithm is vastly superior when the constraints are far apart. 

Figures 11 and 12 show a synthetic example of the reconstruction of a hemispherical surface 
from constraints which form latitudinal circles. The hierarchy of full surface representations was 
obtained after 4.25 work units. Figures 13 and 14 illustrate an analogous example involving 
a hyperbolic paraboloid (saddle surface), where the constraints form parallel parabolic arcs. 
Convergence was achieved after only 2.5 work units (only a single iteration was performed on 
the finest level). Single level algorithms applied to the above surfaces exhibited poor convergence 
properties similar to the case of the cylinder. 

The above examples simulate a visual situation where the surface in the scene has reflectance 
changes in the form of widely-spaced rulings but is otherwise free of intensity changes. This is an 
unlikely situation in view of the fact that the visual world is full of textures, which often arise from 
surface material and pigment changes. Such textures generally result in relatively densely-spaced 
zero-crossings forming, to a certain extent, random patterns. In turn, these zero-crossings give rise 
to constraints having similar properties. Figure 15 illustrates a simulation of this situation using 
a surface varying sinusoidally in depth. A three-level surface reconstruction algorithm was used. 
The constraints input on each level were 30%-density randomly-located samples of the surface 
depth. In addition, to simulate the effects of possible inaccuracies in the constraint values, each 
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1 igure 13. Constraints at four scales consistent with a hyperbolic paraboloid. 









TERZOPOULOS 


MULTI-LEVEL SURFACE RECONSTRUCTION 


Figure 14. Hierarchy of full surface descriptions generated by the multi-level algorithm. 
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Figure 16. Single level approximation after 19 work units. 



sample was corrupted by zero-mean, uniformly-distributed, additive noise whose magnitude was 
one tenth the sample value. The algorithm generated full surface representation hierarchy in 18.75 
work units. Evidently our spring model for the influence of the constraints, with fi h . — 2jh 2 , 
is adequate for this case in that the additive noise has not adversely affected the quality of the 
reconstructed surfaces. The results after 19 work units (iterations) with single-level Gauss-Seidel 
relaxation on the isolated finest grid are shown in Figure 16 for comparison. It is evident that the 
approximation is still far from the true solution. In fact, a total of 71 work units was required 
to reduce the error norm to the magnitude obtained after 18.75 work units by the three-level 
algorithm. The saving in computation is less in this example than in the ones above because, 
first, only three levels were used and, second, the density of the constraints is greater. In general, 
the greater the density of the known depth values, the tighter the surface is constrained, and 
the convergence is expected to be faster. Another way to think of this is that as the average 
distance between constraints decreases, the efficiency of relaxation in liquidating the low-frequency 
Fourier components in the error increases and, therefore, the relative advantage of the multi-level 
algorithm is correspondingly diminished. 

The next examples illustrate the performance of the multi-level algorithm using disparity 
constraints generated by Grimson’s implementation of the Marr-Poggio stereo algorithm [Grimson, 
1981a, 1981b], which includes some of the modifications suggested by Mayhew and Frisby 
[1980, 1981] for exploiting disparity gradient constraints along zero-crossing contours. The stereo 
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Figure 19. Approximation on the isolated finest grid after 950 iterations. 



algorithm was run on three stereo pairs of images, shown in Figure 17, which were digitized 
to 320 X 320 pixels in 256 grey levels. The pairs from top to bottom are A) a synthesized 
random dot stereogram of a “wedding cake” of stacked planes, B) natural images of a portion 
of a coffee jar sprayed with “random dot” paint, and C) an aerial view of a set of buildings 
on the University of British Columbia campus. A three-channel version of the stereo algorithm 
was used. The resulting sparse disparity representations were reduced spatially by a factor of 
two, and input to a three-level surface reconstruction algorithm whose grids had dimensions 
jyhi _ Af/i, = 41, N h x * = iVy 2 = 81, N x 3 = fVy 3 == 161, with corresponding grid spacings 
hi = 0.4, /i 2 = 0.2, and h 3 = 0.1. The surface reconstructions in the examples is based on the 
raw disparities whose relation to depth is through a nonlinear transformation. Consequently the 
shapes of the reconstructed surfaces are distorted to a certain extent. 

The sparse disparity constraints provided by the stereo algorithm and the hierarchy of full 
surface representations generated by the three-level reconstruction algorithm for the “wedding 
cake” stereogram are shown in Figure 18. The value 7 = 0.5 was used in the algorithm, and the 
representations shown were generated after 16.75 work units. The three-dimensional structure of 
the planar surfaces is clearly evident at the three resolutions. The results can be compared to 
Figure 19 which shows the approximation obtained by a single-grid algorithm on the finest grid, 
after more than 900 work units. 

Figure 20 illustrates the sparse constraints and the full surface representations obtained 
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Figure 20. Disparity constraints and full surface representations of the colfee jar. 
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(with 7 = 0.1) from the images of the coffee jar . 13 The reconstruction required 16.0 work units. 
Finally, Figure 21 shows the sparse constraints and the reconstruction obtained from the aerial 
view (with 7 = 8.0), after 21.875 work units. The representations are displayed as grey level 
images, in which darkness is proportional to disparity. 

It should be noted from the above examples that the multi-level surface reconstruction 
algorithm, in its present form, attempts to reconstruct a single surface over the entire grid. As 
a consequence, serious problems arise near sharp changes in depth such as those due to partial 
occlusions of surfaces in the scene. The reconstructed surface gives the undesirable impression 
of a tablecloth thrown over a 3-D model of the scene. The source of this difficulty is discussed 
further in the concluding chapter, where possible ways of overcoming it are suggested. 

6. GENERALIZED INTERPOLATION PROBLEMS IN VISION 

The key to the solution of many problems in early vision is the imposition of constraints based 
upon assumptions about the visual world which are almost always true. A common assumption is 
that matter is cohesive; i.e, that surfaces are continuous over most of the scene. This assumption 
is usually introduced in the form of smoothness constraints, such as those characterizing the 
most consistent surface in visual surface reconstruction. From our study of this problem, we 
have seen that it is appropriate to formulate smoothness constraints within variational principles. 
In this chapter, we study a general class of variational principles, and we propose that the 
functionals characterizing these variational principles are appropriate semi-norms for formulating 
smoothness constraints because they possess several invariance properties which become important 
in applications to early vision. In order to simplify the discussion, the analysis will be in terms of 
interpolatory constraints and domains of infinite extent. Our visual surface reconstruction problem 
will be shown to be a special case of this generalized, optimal interpolation problem which is a 
natural generalization of the familiar curve-fitting problem involving splines. 

The classical spline problem involves the minimization of the quadratic functional 


under the interpolatory constraints v(xi) = c t , 1 < % < N c with N c > m > 1 , where the x t 
are given distinct points in [a, b ] and the c, are given real scalars. The natural setting for this 
problem is a vector space V formed by the class of functions whose (distributional) derivatives 
up to order m are in £ 2 ( 0 , b); that is, the class of functions which are elements of the Sobolev 
space of order m over [a, 6 ], M rn ([a,b}), defined in Section A.l. |-| m is a semi-norm which is 
derived from a semi-inner product and, equipped with it, V becomes a semi-Hilbert space. The 

13 In this example, the constraints for the coarser channels were generated by averaging down the 
finest-channel disparities. 
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conditions imposed on the constraints ensure the existence of a unique solution u £ S, where S 
is the convex subset of V whose elements interpolate the constraints. The characterization of u 
as an odd-degree polynomial spline and various intrinsic properties such as the minimum norm 
and best approximation properties follow from the orthogonal projection theorem [Ahlberg et ai, 
1967] (see also the proof of Theorem A.l). 

Duchon [1976, 1977] and Meinguet [1979a, 1979b] describe an n-dimensional generalization 
of the optimal, univariate spline interpolation problem. The generalized optimal interpolation 
problem involves the minimization of the functional |-|^, where 




d m v[x) 
dx (l ... dx lm 



( 2 ) 


and m and n arfe given positive integ 3 rs. The generalized interpolation problem is naturally set 
in a space V of functions which are elements of the Sobolev space l/ m (5R n ). 14 |-| m is a semi-norm 
whose null-space 15 is the M — ( n ^~™~ _1 ) dimensional space of all polynomials over 8 R n of degree 
less than or equal to m — 1: M = H m— ^SR 71 ) [Schwartz, 1966, pg. 60]. Equipped with the 
semi-inner product corresponding to |-| m , V becomes a semi-Hilbert space . 16 

Let the finite set of distinct constraints C — {(x,, c;) | 1 < i < N c , x* £ SR n , c, £ SR} contain 
a subset of M members such that there exists a unique element p£ Il m— 1 (SR n ) of the null space 
of |-| m which interpolates the M constraints in the subset; that is, such that there exists a unique 
polynomial of degree m — 1 which satisfies the conditions p(xj) — Cj for each j which indexes a 
constraint of the above subset. We call such a subset an fl -un ; 'olvent subset. 


We can pose the generalized optimal interpolation problem in the following way. Given a 
set of constraints C which contain an .A/-unisolvent subset, find that element u £ S such that 

= inf ML 

v£S 

where once again S is the set of functions in V which interpolate the constraints. The problem is 
well-posed because we are minimizing a semi-norm within a convex subset 5 of a semi-Hilbert space 
and, furthermore, the existence of an .A/-unisolvent subset of constraints reduces the null-space of 
the functional to at most a single nonzero element of S. A solution u is then guaranteed to exist 
and be unique by the orthogonal projection theorem (see the proof of theorem A.l ). 17 

14 More precisely, the space V is the Beppo Levi space [Duchon, 1977; Meinguet, 1979a, 1979b] of order 
m over SR" defined by BL m [l R") = {v | d a v £ L 2 for|a| = m}, where a = (ai,...,a n ) is a “vector” of 
positive integers and |aj = ai -}-••• + ot n ; that is, it is the vector space of functions for which all the 
partial derivatives of (total) order m are square integrable in SR". The Beppo Levi spaces are related to 
the Sobolev Spaces. 

15 The null space of a semi-norm is the space of functions which the semi-norm maps to zero. 

16 According to the Sobolev inequality (see section A.l), when m>n/2, V is a semi-Hilbert function 
space of continuous functions [Meinguet, 1979a, 1979b]. 

17 Moreover, as a consequence of the Sobolev inequality given in Section A.l, u will be continuous if 
m > n/2. 
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Why is the class of semi-norms defined in (2) important in the context of vision? As was 
argued recently by Brady and Horn [1981], many processes in early vision are approximately 
isotropic and, therefore, it seems that operators which model these processes ought to be 
rotationally symmetric. An example of such an operator is the edge operator proposed for 

computing the primal sketch [Marr and Hildreth, 1980; Hildreth, 1980]. The class of semi-norms 
defined in (2) is of interest, since all its members |v| m are invariant under rotation and translation 
transformations and, moreover, if a dilation or contraction x i-> Xx is applied to v, they are 
multiplied by some power of |X| [Duclion, 1977, pg. 86]. Therefore, corresponding interpolation 
methods will commute with any similarity transformations applied to the constraints. Clearly, 
these properties are essential for interpolation processes which contribute to th3 generation of the 
2 ^-D sketch — the surfaces generated by surface reconstruction algorithms should not change 
shape as the objects in the scene undergo translations or rotations parallel to the image plane, or 
undergo displacements directly towards or away from the viewer. 


For certain instances of m > 1 and n > 1, the general interpolation problem has familiar 
physical interpretations which are most often encountered in a differential form through the 
associated Euler-Lagrange equations. Consider first the one-dimensional case, n — 1. The 
generalized interpolation problem then reduces to the common univariate spline problem of 
equation (1). The particular value chosen for m determines the order of continuity of the optimal 
curve — as m increases, the smoothness of the solutions increases. In particular, for the case 
m = 1, |u|j = J^v x dx measures the energy in a string of infinite extent, and leads to interpolants 
having C° continuity. The associated Euler-Lagrange equation is u xx — 0. C 1 continuity may 
be imposed on the interpolant by choosing m — 2. In this case, \v\l = J di v xx dx measures the 
strain energy of bending in a thin beam of infinite extent, and the Euler-Lagrange equation is 
u xxxx = 0. This class of univariate semi-norms seems to be appropriate for imposing continuity 
constraints in the computation of optical flow along zero-crossing contours in the primal sketch 
[E.C. Hildreth, personal communication]. 

Next consider the generalized interpolation problem in two dimensions. For n — 2, the 
semi-norms become 


\v 


m 


//.-SCXspsM 1 **- 


m, once again, determining the degree of smoothness of the solution. For m = 1, |u|f — 
/ 1st 2 i V x -f- v 2 )dxdy measures the potential energy related to the statics of a membrane (rubber 
sheet), and the associated Euler-Lagrange can be shown to be Laplace’s equation, A u = 0 
[Courant and Hilbert, 1953, pg. 247]. A semi-norm of this order implicitly imposes the smoothness 
constraints in algorithms proposed for computing lightness [Horn, 1974], shape from shading 
[Ikeuchi and Horn, 1981], optical flow [Horn and Schunck, 1981], photometric stereo [Ikeuchi, 
1981], etc. With m = 2, the smoothness of the interpolating surface is increased to C 1 , the 
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functional taking the form \v\\ —- / J^a (v 2 xx + 2v\ y -f v 2 yy ) dx dy. This will be recognized as 
being our familiar functional representing the strain energy of the thin plate <fi with Poisson 
constant a = 0 (refer to equation (2.1)), whose Euler-Lagrange equation was shown to be the 
biharmonic equation, A 2 u — 0. 18 As we have demonstrated, this order of smoothness seems to 
be most appropriate in visual surface reconstruction from, e.g., stereo information (refer also to 
the discussion on the “quadratic variation” in [Grimson, 1981a]). 

It becomes clear that we are dealing with a class of quadratic variational problems, the order 
of whose Euler equations is determined by the degree of smoothness demanded of the solutions. For 
m = 1 we obtain Laplace’s equation in n dimensions, for m = 2 the biharmonic equation, and so 
on. In general, the Euler equation is an n-dimensional, linear, elliptic partial differential equation 
of order 2m. Moreover, the general interpolation problems have straightforward formulations 
as analogous approximation problems. For example, we can define appropriate constraint terms 
analogous to the term <f 5 (equation (2.2)) for our surface approximation problem. Hence, there 
exists a general framework in which to solve functional approximation problems, of the type 
arising naturally when imposing smoothness constraints in early vision. Meaningful, problems can 
be formulated in any number of dimensions, and the degree of smoothness that the solutions 
should possess can be specified a priori. In this sense then, the Sobolev spaces can be viewed 
as ingenious formalizations of the notion of the “degree of smoothness” of admissible functions 
and therefore are ideal domains in which to pose and solve these problems. By specifying the 
(order of) the Sobolev Bpace to which the solution should belong, we designate its position 
in the wide spectrum from very smooth functions to singular distributions. Satisfaction of the 
requirements, that the admissible space be a semi-Hilbert space and that the constraints include 
an .V-unisolvent subset will guarantee uniqueness. Needless to say, the theory of the finite element 
method is applicable to either the interpolation or approximation formulations, and it will dictate 
appropriate finite element discretization schemes for the associated variational principles. 

When solving these variational principles using local, iterative algorithms such as the 
ones described in this paper, smoothness constraints are imposed globally over retinocentric 
representations by a process of constraint propagation. Inspired by the work of Waltz [1975], a 
class of algorithms called relaxation labeling algorithms were introduced as cooperative, constraint 
propagation techniques in vision and image processing by Rosenfeld, Hummel, and Zucker [1976]. 
Although they have seen extensive use [Davis and Rosenfeld, 1981], their generality has made 
them difficult to understand and, unlike the techniques and algorithms which are the subject of 
this paper, the foundations of most relaxation labeling schemes are unfortunately poorly-developed 
mathematically. 


18 Duchcm [1977] understandably rc f ers to the solutions as thin plate splines which also reflects the 
fact that they are natural two-dimensional generalizations of commonly-used univariate splines. In the 
engineering literature .they are called surface splines [Harder and Desmarais, 1972]. 
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Recently, some theoretical understanding has been achieved by viewing relaxation algorithms 
as techniques for solving constrained optimization problems (see, e.g., [Ullman, 1979b], [Faugeras 
and Berthod, 1981], [Hummel and Zucker, 1980]). From this new point of view, the relationships 
between relaxation labeling techniques and iterative solution of finite element equations arising 
from variational formulations become clearer — relaxation labeling schemes can be viewed as 
iterative algorithms for solving optimal approximation problems over closed convex subsets (of 
possible labelings) [Hummel and Zucker, 1980]. Necessary conditions for solutions (fixed points) 
are then expressed as sets of variational inequalities [Ciarlet, 1978; Kinderlehrer and Stampacchia, 
1980] and appropriate updating rules are natural generalizations of the classical local iterative 
methods for solving large systems of linear [Young, 1971] or nonlinear [Ortega and Rhrinboldt, 
1970] equations. Moreover, if the compatibility functions among neighboring nodes are symmetric, 
then there exist associated variational principles defining equivalent formulations as minimization 
problems. Fortunately, it is possible to apply the finite element method to nonlinear problems 
stemming from variational inequalities [Ciarlet, 1978]. In a certain sense then, finite elements can be 
viewed as systematically-derived, physically-based compatibility relationships among neighboring 
nodes. In view of the relationships between the two techniques, it is hoped that aspects of our 
multi-level approach to solving the discrete finite element equations for the surface reconstruction 
problem may contribute to the theory of hierarchical relaxation labeling [Davis and Rosenfeld, 
1981; Zucker, 1978; Zucker and Mohammed, 1979]. 

7. SUMMARY AND EXTENSIONS 

Information about the shapes of visual surfaces that is inferred from the retinal images in 
the early computational stages in vision is sparse. Nevertheless, our perception is that of full 
(piecewise) continuous surfaces. In this paper we have proposed a hierarchical approach to the 
reconstruction of full surface representations, consistent with our perception of the visual world. 
The foundations of our paradigm are embedded in a tight mathematical formalism which at 
the same time seems sufficiently general to encompass many aspects of the complex information 
processing task which is the generation of the full 2|-D sketch. 

Visual surface reconstruction was formulated as an optimal approximation problem having 
an intuitively simple physical interpretation — a thin flexible plate which is allowed to achieve an 
energy-minimizing state of stable equilibrium under the influence of externally-imposed constraints. 
This physical model led directly to an analysis in terms of the calculus of variations, and a 
proof that the problem is well-posed in practice. The model also suggested a class of techniques 
for optimally approximating the continuous solution by an equivalent discrete problem which is 
amenable to computational solution. We chose to apply the finite element method for reasons 
which include its generality, the availability of a tight theory governing its use, the simple 
discrete problem to which it gives rise, and its promise in vision as a systematic methodology 
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for constructing local representations of surfaces . 19 At each step, the underlying mathematical 
theory assured us that, ultimately, our problem would have a unique solution that, in principle, 
could be computed by biologically-based mechanisms. Our search for efficient algorithms and 
our insights into the multi-level structure of the early processing stages in vision led us to a 
multi-level algorithm which solves simultaneously a hierarchy of surface approximation problems 
spanning a range of resolutions. The local-support processes comprising the algorithm include 
iterative intra-level relaxation processes, and inter-level processes which serve to communicate 
information between levels. The inter-level process include injections from line grids to coarse 
and polynomial interpolations from coarse grids to fine. Tests on stereo da*a verified that our 
multi-level surface reconstruction algorithm meets theoretical expectations of increased speed and, 
moreover, generates a potentially useful hierarchy of consistent surface representations. Finally, 
we examined our basic surface approximation problem in a more general setting, and related it 
to a broader class of optimal approximation problems based on semi-norms that commute with 
similarity transformations applied to the constraints, a property which is important in the context 
of vision. 

Although we have laid down the foundations of our approach primarily in terms of stereopsis, 
the methodology is by no means limited to the type of information produced by this particular 
module. Indeed, our point of view speaks to the broader issue of how to combine the information 
about the shapes of visible surfaces generated by various vision modules ;nto a self-consistent 
whole. Several possibilities arise, some of which we will now consider briefly. 

The simultaneous assimilation of information from different sources can be realized by 
defining more sophisticated penalty functions to replace £5 in Chapter 2. For example, in the case 
of depth constraints from, say, stereo and motion, we can straightforwardly introduce additional 
terms of the same form as £5 for each process. In terms of our plate model, we introduce two sets 
of imaginary pins with attached springs, and allow the possibility of a constraint generated by 
stereo to coincide with one provided by motion. Imperfections in the retinal images are likely to 
affect the two processes in different ways, and moreover each will in its own way sporadically fall 
prey to gross misinterpretations of the information in the primal sketches. Whatever the situation, 
our physical model assumes an energy-minimizing state, and the resulting surface is an optimal 
compromise in view of the constraints provided. In places where the information is consistent, the 
final interpretation is reinforced. In places where there is a conflict, it is resolved by competition 
with nearby constraints from both processes. 

The influence of each constraint may be controlled, possibly dynamically by the processes 
themselves, by assigning different values to each spring constant. For example, different confidence 
values may be given to individual constraints generated by the stereo matcher, according to 

19 On this latter point we should mention again that the finite element method allows us to handle 
domains of complex shape, natural boundary conditions, and to set up nonuniform discretizations of the 
domain — e.g., to vary the resolution across the domain. 


60 



TERZOPOULOS 


MULTI-LEVEL SURFACE RECONSTRUCTION 


regional statistics of the rate of successfully matched zero-crossings, which may have to be 
computed anyway [Marr and Poggio, 1979; Grimson, 1981a, sections 2.5 and 3.4]. The extent of 
the constraint’s influence on the surface may also be varied by extending our model to that of 
an inhomogeneous plate, whose flexibility varies over the domain. Numerous possibilities exist for 
defining weighting functions to apply to the strain energy density of the plate. All such proposals 
for modifying the form of the functional must be first be shown to lead to well-posed problems, 
by extensions of the analysis carried out in Chapters 3 and 4. 

Another important issue is how to incorporate information other than measurements of the 
distance to the surface. An important class of processes generate cues about visual surfaces in 
the form of local orientation measurements. Examples in this class are the analysis of occluding 
contours [Marr, 1977], as well as “shape-from” processes such as shape from shading [Horn, 1975], 
contours and texture [Kender, 1980; Stevens, 1981; Witkin, 1981], regular patterns [Kanade, 
1981], etc. The finite element method provides a general way of handling orientation constraints 
through the use of elements with degrees of freedom which include the first partial derivatives of 
the surface. An appealing example is Adini’s rectangle which was described in Section 4.1. Surface 
representations based on this element would make explicit the information about the local slope 
of surfaces, as well as their distance from the viewer. Discrete problems derived by applying this 
element would correspond to a coupled system of two discrete Euler-Lagrange equations for the 
plate, a fourth-order equation for the displacements at the nodes, and a second-order equation 
for the slopes. On the other hand, we pay a price for this added capability — the dimension of 
the finite element space is tripled, making the resulting discrete system even larger. Nevertheless, 
the price may turn out to be worth paying in order to obtain useful surface representations 
and, moreover, it may not be too high when massively parallel computational mechanisms are 
contemplated. 

A different way of handling orientation constraints which can be used with our simple 
quadratic elements is, once again, by the use of appropriate penalty functions. In terms of our 
model, we can imagine the situation for a single constraint and a surface patch as illustrated 
in Figure 22. Here, we attach a spring between the surface normal, an imaginary quill rigidly 
fixed to the plate’s surface at a particular point, and the orientation constraint, another quill 
emanating from the same point, but having a fixed orientation in space. Given this arrangement, 
the surface is “pulled” locally so that its orientation tends to align with that of the constraint. 
The appropriate penalty term is the potential energy in the spring. This energy can be expressed 
straightforwardly in terms of the first partial derivatives of the quadratic surface patch within 
the element, and ultimately in terms of the node displacements. 

A more immediately important issue, one that was raised in reference to the examples of 
surface reconstruction presented earlier, is that of dealing with depth discontinuities. In its present 
form, the surface approximation algorithm can deal in a meaningful way with scenes containing 
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Figure 22. Physical model of the effect of an orientation constraint. 


orientation constraint 



only a single surface. This is due to the fact that it does not incorporate the notion of an occluding 
contour, that is to say, it attempts to fit a single surface over the whole sparse 2^-D sketch, 
interpolating indiscriminately across contours which correspond to places where surfaces in the 
scene occlude one other from the viewer. Clearly, this action is inappropriate since the surfaces 
on either side of the occluding boundary ought to have no influence on one another. Moreover, 
the variational principle for surface approximation was based on the small deflection theory of 
the plate 20 and, consequently, we expect our surface to behave strangely in the vicinity of a large 
change in depth, resulting in, for example, a Gibb’s phenomenon similar to that observed when 
approximating discontinuous functions with Fourier series. 

How can these depth discontinuities be detected and how do we prevent interpolation 
across them? Grimson [1981a, section 9.4] noted the importance of this question and made some 
speculations about possible answers to the first of its two parts. The feasibility of his suggestions 
remains an open question. Here, we would like to propose another approach that is again suggested 
by our physical model. In places of sharp changes in depth (or surface orientation), the strain 
energy in the plate will be locally high. Measuring this energy locally is a simple matter — we use 
the energy inner product to compute the strain energy norm a^(v h , v h )* over the element domains. 
Points of high strain energy are likely candidates for inferring the presence of discontinuities in 

20 The large deflection plate bending theory is considerably more complicated and leads to an 
Euler-Lagrange equation in the form of two coupled, nonlinear, fourth-order partial differential equations 
known as von Karmann’s equations (see, e.g., [Landau and Lifshitz, 1970] or [Mansfield, 1964]) 
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depth. We can also exploit our expectations about the world for added constraint, and assert 
that, since most of the retinal image is made up of coherent surfaces, occlusions in depth are 
likely to form contours in the image and not be sparsely-occurring points. Hence, we look for 
contours along the surface of the plate, where the strain energy is high. Having located the 
occluding contours, the answer to the second part of the question is simple, in principle. To 
prevent interpolation across different surfaces, we “break” the plate along occluding contours. 
Mathematically speaking, this is done by removing plate elements along the occluding contour, 
thereby introducing free boundaries. 

Our multi-level approach to surface reconstruction constitutes a computational paradigm 
which has contributed toward a more complete understanding of the generation of the 2^-D sketch. 
Many details such as the combination of information from the various modules in early vision and 
the isolation of depth discontinuities remain to be worked out rigorously within the paradigm. In 
addition, a number of exciting issues are raised. For example, how can the hierarchy of surface 
representations generated by the algorithm be used to advantage during later computational 
stages in which three-dimensional, object-centered representations are generated and objects are 
recognized. Similar implications directed to the related field of robotics and manipulation also 
suggest themselves. Research addressing some of these issues is currently in progress. 
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A. THE FINITE ELEMENT METHOD 


When it is impossible to derive an analytical solution to a continuous variational principle, 
it is usual to attempt an approximation by defining a discrete problem which is similar to the 
continuous one and which leads to a discrete solution. To this end, we will first state an abstract 
variational principle which will lead us to an optimal approximation to the exact solution. The 
variational principle is called abstract inasmuch as it represents a formulation which is common to 
a variety of physical problems, such as the physical model for our surface reconstruction problem. 
We will also state theorems which give conditions guaranteeing the existence and uniqueness of 
the approximate solution and, in addition, we will discuss the optimal properties of the proposed 
approximation. 


The abstract variational principle and the associated theorems are stated in a form which is 
convenient for the application of the finite element method, a powerful technique for obtaining, by 
numerical means, discrete solutions to variational problems. 1 The following sections develop the 
mathematical machinery which we will require to successfully apply the method. Key mathematical 
ideas include a set of Hilbert spaces (the Sobolev spaces) and their norms, a bilinear form (the 
energy inner product) which is naturally associated with the specific problem, and certain optimal 
properties of the (Ritz) approximation over finite dimensional subspaces. These ideas lead to 
a clean and precise theory governing the application of the finite element method, even for 
complicated geometries. Comparatively tight theories are unavailable for alternate approximation 
techniques which naturally arise from nonvariational problem statements; e.g., the finite difference 
method which can be applied to equivalent formulations in terms of differential operator equations 
(such as Euler-Lagrange equations). Excellent accounts of the mathematical theory of the finite 
element method are [Ciarlet, 1978], [Oden and Reddy, 1976], and [Strang and Fix, 1973]. An 
extensive development from an engineering point of view is presented in [Zienkiewicz, 1977]. 

A.1. The Sobolev Spaces 

Fundamental to finite element analysis are a set of spaces called the Sobolev spaces (see, 
e.g., [Agmon, 1965], [Adams, 1975]). They are a generalization of the familiar L 2 space which 
consists of all functions v: ft C 1 —'► 3? (where fi is a bounded domain) whose L 2 norm over fl 



is finite. We denote the partial derivatives of v by the notation 


d a v(x) = 



v{x), 


x The finite element method was conceived by Courant [1943]. 
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where a — (a-i,..., a n ) is a multi-index of positive integers a ,. 2 The Sobolev norm of order m 
over fl combines the L 2 norms of all partial derivatives of v up to order m: 

IM|m,n = r /j 3% l 2tix ] » (1) 

'*|q| < m ' 

where |a| — aq -}~ * * • 4~ «n- The Sobolev space of order m over fl is then defined by 

r n (Q) = {v I IMU.n < 00}. (2) 

Clearly, = L 2 . 

We will also require the associated semi-norm 

Mm,n = f X) L l 5rtv | 2rfx ) (3) 

'■|a| — m ' 

which includes only the derivatives of order m exactly. It is a semi-norm because it is zero if 
v — 7 T m —1 € n w “ 1 (n), where II m—1 is the space of polynomials of degree m — 1 defined in fl 
[Strang and Fix, 1973, pg. 298]. 

Since the Sobolev norms are sums of L 2 norms, they have associated inner products and, 
therefore, the Sobolev spaces are Hilbert spaces . 3 Let C^fl) denote the class of functions which 
have continuous partial derivatives of all orders up to order q. A fundamental embedding property 
of the Sobolev spaces is given by the Sobolev inequality which states that C q C M rn if and only 
if m — q > n/2, where n is the dimension of 3ft n . 

A.2. An Abstract Variational Principle 

Let V be a normed vector space with norm ||*||, and S be a nonempty subset of V . Moreover, 
let a(-, •): V X V t-+ 9? be a continuous bilinear form and f:V 9? be a continuous linear form. 

Definition 1 — abstract (quadratic) variational principle 
The problem: find an element u s such that 

u s eS and £(u s )— inf <f(v s ), (4) 

v s es 

where the functional £:V h-> 3J is defined by 

£(v)= i<z(v, v) — f{v), (5) 


2 The derivatives are to be interpreted in the generalized (distributional) sense, but when a derivative 
exists in the classical sense, it is equal to the generalized derivative (see, e.g., [Schwartz, 1966]). 

Extensions of the definition of Sobolev spaces and norms have been made to negative and nonintegral 
order m (see, e.g., [Agmon, 1965], [Adams, 1975]). 
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will be referred to as the abstract variational principle. 

Theorem 1 — existence and uniqueness 
Assume in addition that 

(i) the space V is complete , 4 

(ii) S is a closed convex subset of V, 

(iii) the bilinear form a(-, •) is symmetric, 

(iv) a(-, •) is V-elliptic; 5 i.e., there exists a constant a > 0 such that 

Vu (: V, a(v, v) > a||u|| 2 , (6) 

then the abstract variational principle has a unique solution u s G 5. 6 

Proor. [Ciarlet, 1978, pg. 3] The bilinear form a(-, •) is an inner product over the space V. Since 
it is continuous,’it is bounded, 7 and because it is also V'-elliptic, there exist constants a and /j, 
such that a||u|| 2 < a(v, u) < /z||u|| 2 . Therefore, the norm associated with the inner product is 
equivalent 8 to the given norm |j|j, and V becomes a Hilbert space when it is equipped with this 
inner product. According to the Reisz representation theorem (see, e.g., [Oden, 1979] or [Yosida, 
1971]) there exists an element u E V such that 

Vv£V, f(v) = a(u,v), (7) 

and because a(-, •) is symmetric, 

— a ( u > vS ) = \ a ( vS — «, v s — u) — ^a(u, u). 

Therefore, the minimum of £(u s ) and the minimum of a(v s — u, v s — u) as v s ranges over the 
set S are achieved by the same element u s E S. In other words, solving the abstract variational 
principle is equivalent to finding an element u s E S which is closest to u with respect to the 
norm a (; •)*: 

a(u — u s , u — u s )2 = inf a(u — v s , u — (8) 

V s £S 

By the projection theorem, (see, e.g., [Oden, 1979] or [Yosida, 1971]) the solution is the projection 
of u onto S with respect to the inner product a(-, •), and its existence and uniqueness is assured 
by the fact that S' is a closed convex subset of V. | 

4 That is to say, it is a Banach space. 

5 V-ellipticity means that the bilinear form is positive definite ; i.e., a(v, v) = 0 if and only if v — 0. 

6 Theorem 1 is a generalization of the familiar theorem for the existence of a unique solution to a 
(quadratic) minimization problem in mathematical programming (see e.g. [Luenberger, 1973]). 

7 A bilinear form is continuous if and only if it is bounded; i.e., there exists a constant p such that 
| a (u, v)| < M|H|||t;|| [Rektorys, 1980, pg. 111]. 

8 Two norms ||-|| and j[-||' on a linear vector space V are called equivalent if the corresponding 
metrics are equivalent. This amounts to the existence of two positive constants c\ and C 2 such that 

HI < HI' < c 2!MI- 
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The abstract formulation encompasses linear variational problems which are posed classically 
in terms of variational principles involving the minimization of quadratic functionals <f(v) = 
\a(v , v)— f(v) over an admissible space of functions v E V. Such functionals often represent the 
potential energy of a physical system, a(v, v) being the second-degree term which is the strain 
energy in the function v (f(v) is a first-degree term). The associated inner product a(u, w) is the 
energy inner product which is intrinsic to the particular variational principle, and is defined for 
all admissible functions v and w. 

It is clear from the above discussion that the admissible space V must be complete and that 
<f(v) must be well-defined for all v £V (i.e., that V must be a space of finite energy). The Sobolev 
spaces fulfill these conditions. Their use as generalized energy spaces is natural in the sense that, 
for a given variational principle, the energy inner product is well-defined over a Sobolev space 
)i m ', where m is the highest order of partial derivative of v wdiich occurs in a(v, v). In general 
then, V is the space "H m whose natural norm is the Sobolev norm ||-|| m . 

The role of the subset S is in approximating the exact solution u. Although u is usually 
impossible to obtain over the full admissible space V, it may be relatively straightforward to 
optimally approximate it by an element u s E S, especially if S is taken to be a closed subspace 
of V. The approximation is optimal in the sense of equation (8). In the ensuing discussion, we 
will restrict ourselves to the special case where S is a closed subspace of V. The approximate 
solution of the abstract variational principle may then be characterized by the following theorem. 

Theorem 2 — variational equation 

If S is a closed subspace of V, then u s £ 5 is a solution of the abstract variational principle 
if and only if it satisfies the variational equation 

Vv s E S, a{u s , v s ) = /(v s ). (9) 

Proof. If u s minimizes £ over S, then for any e and v s E S, 

£(u s ) < £(u s -f- ev s ) = - a(u s -f- ev s , u s -f- ev s ) — f(u s -f- ev s ) 

z 

= £ (u 5 ) + e[a(u s , V s ) — /(u s )j + ^ e 2 a{v s , u s ). 

Therefore, 

0 < e[a(u s , v s ) — f(v s )] -f ^e 2 a(v s , v s ), 

and since this must be true for small e, both positive and negative, it follows that a[u s , v s ) = 

/( v s ). 9 | 

9 In the. general case where S is not a closed subspace, but is only a closed convex subset of V 
as required by condition (ii) of Theorem 1, the solution u 5 must satisfy the variational inequality 
a(u s , — u s ) > f(v s — u s ) [Ciarlet, 1978, pg. 3). 
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We will now discuss some important properties of the solution of the abstract variational 
principle. First, from the proof of the theorem, it is clear that (9) is the well-known condition 
for the vanishing of the first variation of £ at 11 s , in the direction of v s . In particular, if S is the 
whole space V, then the solution satisfies 

a(u, u) = f{v), (10) 

and the first variation at u vanishes in every direction v. Setting u = v, we have a(u, u ) = /(it) 
and hence 

£(«) = 2 a ( u » u ) — f( u ) = — w); 

i.e., at the minimum, the strain energy is the negative of the potential energy. Now, £ (u) < <f(u 5 ) 
since u is minimal over a wider class of functions. Then, 

a(u s , u s ) < a(u, u), 

and so the strain energy in u s always underestimates the strain energy in u. Moreover, since u s 
is the projection of it onto the subspace S, the error e s — u — u s is orthogonal to S: 

a(e s , v s ) — 0 \/v s ES. 

In particular, a(e s , 11 s ) = 0 or a(u, it s ) = a(u s , tt s ) and 

a(e s , e s ) — a(u, it) — a(u s , u s ); 

i.e., the energy in the error equals the error in the energy (the Pythagorean theorem holds). 

A.3. The Ritz Approximation 

The key condition in the hypothesis of Theorem 2 is that the subspace S be closed. How can 
we ensure this? One possibility arises from the fact that finite dimensional subspaces are always 
closed. A number of classical methods for solving variational problems, called direct methods, are 
based on this. 10 One of these, the (Raleigh-) Ritz method [Ritz, 1908; Mikhlin, 1964; Rektorys, 
1980] is of fundamental importance when a variational principle is involved. In the Ritz method, 
we choose a finite dimensional subspace 

N 

s = s'* = {</■ 1 = £>•*<} (n) 

t=l 

where <f>i,...,4>N are independent basis functions which span S h and Vi, ...,Vjv are unknown 
real parameters. 

10 Direct methods include the method of weighted residuals whose special cases include collocation methods 
and the Galerkin method , the method of orthonormal (e.g., Fourier) series, and the least squares method, (see, 
e.g., [Finlayson, 1972], [Mikhlin, 1964], [Rektorys, 1980], [Zienkiewicz, 1977]). 
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By Theorem 1, the approximate solution to the variational principle is the unique element 
u h £ S h which is the projection of u onto S h . This amounts to choosing parameters v, which 
satisfy the discrete variational principle 


£{u h )= inf £(v h ) 
v h es h 


j N N N 

in v f o EE ^) v * v j- Y 


V„..,V N 6!R 2 


( 12 ) 


«'=--! j =1 


t=l 


or, by Theorem 2, the associated variational equation 

Vv h £ S h , a{u h , v h ) = f{v h ) 
which is, in fact, a linear system of algebraic equations 


N 

J2 *>> = /(*). i < *< w. 

j—i 

These equations can be written in the compact matrix form 

Au = f, (13) 

where A £ 5R NV == <fj)) and where f £ == [/(<£,)] and u £ = [u t ], called the discrete 

variational equation. Since the matrix of coefficients A is nonsingular, the discrete solution is 
given by u = A -1 f, although for the problem at hand A is huge, so it is usually impractical 
to compute A --1 directly. In the next section, we describe special types of basis functions which 
ultimately lead to practical iterative solutions. 

A.4. Finite Element Spaces 

The Ritz method has given us a discrete solution u h = YlYi u t0t which is optimal in the 

sense that the energy in the error, as measured in the natural energy norm a(u — u h , u — u h )*, 

is as small as possible. In the classical Ritz method, the basis functions <j>, are generally chosen 

to be fairly complicated functions which have global support over the domain in question (e.g., 

trigonometric functions) [Mikhlin 1964; Rektorys, 1980]. Although this choice may be beneficial 

for analytic purposes, it renders the method unsuitable for numerical computation. The problem 

is overcome by the finite element method which is a systematic procedure for constructing finite 

dimensional approximating subspaces S h , called finite element spaces, which are very convenient 

for numerical computation. In certain forms, the method may be considered to be a special 

instance of the Ritz method in which the basis functions are simple functions having local support. 

In the ensuing discussion, we will restrict ourselves to a domain H which is a polygon in SR 2 with 

boundary dfi - 11 The following are basic characteristics of the construction in its simplest form: 

(i) A “ triangulation" T h is established over the domain: fl = \J EeTh E; that is, the domain 
is partitioned into the union of subdomains E £ T h called finite elements, such that the E 
are closed sets with nonempty interiors and polygonal boundaries. The elements are usually 

11 The theory has been extended to domains with curved boundaries in any number of dimensions. 
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adjacently-placed triangles or rectangles which overlap only at the inter-element boundaries. 
Associated with the triangulation is its fundamental length h. 12 

(ii) The elements are considered to be interconnected at a discrete number of points on the 
inter-element boundaries which are called the nodes of the triangulation. The unknown real 
parameters of the discrete problem are the nodal variables , the values of the solution (and/or 
possibly of its derivatives) at the nodes. 

(iii) Associated with the triangulation is a space of functions S h defined over H. Defined within 
each element E is a finite dimensional space P E — {u ;i |e | v h E S h } consisting of functions 
which are polynomials (or ratios of polynomials). The polynomials p E E l' E represent a local 
approximation of the solution within E, and are uniquely determined by the nodal variables 
associated with the element. 

(iv) In certain elements, the functions p E may have to satisfy the essential boundary conditions 
of the problem. 

While the classical Ritz method is limited to geometrically simple domains H, in the finite 
element method this limitation occurs only within the element itself. Consequently, it is possible 
to “assemble” complicated configurations from simple element shapes. Several factors contribute 
to the success of the finite element method from a computational point of view. Firstly, due to 
the fact that in the element interiors the solution is approximated by a low-order polynomial in 
x and y, the computations required to compute the discrete functional in (12) or, equivalently, 
to compute the entries of matrix A of the discrete variational equation (13) are often simple. 
Secondly, it can be shown that, associated with the local polynomial functions, there exists a 
canonical set of basis functions <f>, spanning S h which are also piecewise polynomials and which 
have local support A will therefore be sparse and banded ; that is, most of its relatively few 
nonzero entries will lie near the main diagonal. Thirdly, when the problem is well-posed in terms 
of a variational principle, o(-, •) will be symmetric and -elliptic, which guarantees that A will 
be nonsingular, symmetric, and positive definite. In addition to these clear merits, piecewise 
polynomials are remarkable in that they are optimal in terms of their approximation properties 
and in that these properties are essential for proving convergence of the method [Strang and Fix, 
1973, pg. 153]. 

The convergence properties of the finite element method are an important issue. The object 
of the Ritz method is to find optimal values for the nodal variables (which are the parameters of 
the discrete solution) by minimizing the discrete functional £(v h ). This suggests immediately the 
possibility of approximating the exact solution u by a minimizing sequence of discrete solutions to 
discrete problems associated with a family of subspaces S h whose fundamental length h has limit 
zero. Although the approximation is known by (8) to be optimal in terms of the norm a(-, -)a, it 
is more convenient to analyze the error in terms of the natural Sobolev norm ||-|| m of V C k m . 
The following theorem gives a sufficient condition for the convergence of such a sequence. 

12 The fundamental length h of the triangulation T h is the maximum “radius” of the elements. As the 
subdivision is made finer, the number of elements increase and h —► 0. 
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Theorem 3 — Cea’s Lemma 

Since there exists a constant C independent upon the subspace S k such that 



(14) 


then a sufficient condition for convergence is that there exists a family S h of subspaces of the 
space V such that, for each u£V, liin/ i ,_-.o||u— u /l, || = 0; i.e., 

lim inf llu — v^ll = 0. 
h-*ov h es h 


Proof. Equation (14) follows from (8) due to the continuity and V'-ellipticity of a(-, •). Moreover, 
C = ^ is a constant independent upon the subspace S h . | 

We see then that an estimation of the error reduces to finding the distance between the exact 
solution u and the subspace S h — a problem in approximation theory. The basic hypothesis about 
the finite element space S h was that the finite-dimensional space P E within each element E is a 
space of polynomials. If we assume that the space contains the complete polynomials of degree k 
(i.e., U k C P L ), it can be shown in general that the approximation error in the derivative, 
where s ^ m, is of the form 


||U — U% == 0{h k+1 ~ S + &2(fc+l-m)) ( X5 ) 

[Strang and Fix, 1973, pg. 107]. On the other hand, because the approximation minimizes the 
strain energy, the order of convergence of the derivative is better. It is order /i 2 ( fc + 1 — m ). 

The convergence properties implied by Cea’s lemma are contingent upon the finite element 
spaces S h being subspaces of the admissible space V. In view of this, if the energy inner product 
a(v, v ) involves partial derivatives of v of order m so that V C # w (fi), ensuring convergence 
amounts to imposing the following two requirements on the local functions p E : 

(i) Completeness condition: As the size of any element tends to zero, the function p E must be 

such that a constant value of the derivative will be attainable in the element subdomain; 
i.e. we must have k > m so that P E C Vi? £ T h . 

(ii) Conformity condition: All derivatives of p E of order less than m must be continuous across 
inter-element boundaries; that is, S h C C rn ~ 

The two requirements are necessary and sufficient for S h C ^ m (H) when the p E are polynomials 
(or ratios of polynomials). Another way of stating the completeness condition is that the local 
polynomials must be able to reproduce a state of constant strain — any solution which is a 
polynomial of degree m. When the local polynomials satisfy the conformity condition, the elements 
are called conforming finite elements, and their use leads what are referred to as conforming finite 
element methods. 
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A.5. Nonconforming Elements 

In the above discussion, we assumed that conforming finite element methods approximate 
the solution u of <f(u) — inf^s £ (u) by the solution u h of £(u h ) — inf v h &S h £(v h ) where S h 
is a subspace of S. This is a global condition on the approximation which is often violated for 
reasons of computational convenience. For instance, it may be violated by dropping the element 
conformity condition. 13 Elements which do so are called nonconforming dements. They are often 
used in practice for higher-order problems because conforming elements for such problems are 
unnecessarily complicated or must have a large number of degrees of freedom in order to satisfy 
the inter-element conformity conditions. 

If nonconforming elements are used, it is clearly impossible to evaluate the true energy 
functional <f due to the singularities in the derivatives of v h which occur at the element 
boundaries. To avoid this problem, wc can simply ignore the discontinuities between elements by 
computing the strain energies within each element and then summing the individual contributions; 
that is, for the original energy inner product a(-, •), we substitute the approximate energy inner 
product a^(-, •) defined by the bilinear form 

a h{‘, ) = a (’> Ola. ( 16 ) 

Eer h 

where the notation \e means a restriction to the element domain. The approximate variational 
principle is then the problem of finding a u h £ S h which minimizes the functional 

£ 4 ”'*) = v h ) - /(«*), (17) 

and the necessary condition for the vanishing of the first variation becomes 

Vt; 1 * £ S\ a h (u\ v h ) = f{v h ). (18) 

Following in the spirit of the conforming case, we must determine sufficient conditions for 
the existence and uniqueness of the solution u h to the approximate variational principle, as well 
as under what conditions this approximate solution converges to the exact solution u as h —*■ 0. 
The conditions are natural extensions of Theorem 1 and Cea’s lemma, and are given in the 
following theorem. 

Theorem 4 — existence and uniqueness (nonconforming case) 

If 

(i) there exists a mapping |J-j|: .S'^ 3R which is a norm over S h , 

(ii) •) is bounded and S h -elliptic, in that there exists a constant > 0 such that 

13 This violation is an example of a so called vanational crime [Strang and Fix, 1973, Chapter 4]. 
Besides violation of the element conformity condition, variational crimes also in'Jude inexact evaluation 
of the functional (i.e., of the quadratic form a(u k , v h ) and linear form f(v h )) such as by numerical 

integration, as well as various techniques for the approximation of essential boundary conditions. 
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Vv h e S h , a k [v h , v h ) > 


then the approximate variational principle has a unique solution u h £ S h . 


Proof. Refer to the discussion in [Ciarlet, 1978, Chapter 4]. | 

On the other hand, to obtain convergence, we impose a stronger condition, uniform S h -ellipticity, 
which requires that there exist a constant a > 0, independent of h, such that 

WS\ Vv h e S\ a h {v k , v h ) > a||</‘|||. (19) 

Convergence is then guaranteed by the following theorem. 

Theorem 5 — Strang’s lemma 

Given a family of discrete problems for which the associated approximate energy inner 
products are uniformly S h -elliptic, then there exists a constant C, independent of S h , such that 


u 




< C\ 


inf 

) h e.s h 


\u — v h \\ h + sup 
w h £ S h 


|q ;i (u, w h ) — f(w h )\ 

||w fc IU 


( 20 ) 


Proof. See [Ciarlet, 1978, pg. 210]. | 

Strang’s lemma is a generalization of Cea’s lemma for conforming elements — in addition 
to the usual approximation error term, we have the (inf) term which measures the consistency 
error of the nonconforming space. Since the difference a h (u, w h ) — f(w h ) is zero for all w h £ S h 
when S h C V , the consistency error for conforming spaces is identically zero and Strang’s lemma 
reduces to Cea’s lemma. However, for the nonconforming case, convergence is obtained if the 
consistency condition, 


lim sup 

h—*0 w hgsh 


\a h {u, w h ) — f(w h )| 

I^IU 


\/v h £ S h , 


( 21 ) 


is satisfied. 


The consistency condition was first recognized empirically and was stated in the form of a 
simple test known as the patch test. Subsequently, Strang proved mathematically that the patch 
test was indeed a test for consistency and, by essentially making the above arguments, that 
nonconforming elements which pass it will yield convergence (see [Strang and Fix, 1973, Chapter 
4]). The test remains a convenient one to apply. 

Theorem 6 — the patch test 

Suppose that the energy inner product a; t (u, u) contains derivatives of order at most m and 
the nonconforming space S h contains all polynomials Km of degree at mostm. If the nonconforming 
finite element method recovers all solutions which are polynomials of degree at most m, then the 
patch test is passed, and lim/ l _ 0 || u — uh \\h — 0. 
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In other words, suppose that we put an arbitrary patch of nonconforming elements associated 
with the nonconforming space S h in a state of constant strain; that is, we impose v h = TT m £ n m 
on the displacements at nodes around the patch boundary. Because the completeness condition 
of the previous section is still binding on nonconforming elements, this polynomial is both an 
element of S h and an element of V, hence its consistency error is zero. Thu conforming (Ritz) 
solution to (12) or (13) and the nonconforming, discrete solution of the approximate variational 
principle ought then to be identical and equal to 7r m . The test is to determine whether this is 
indeed the case. 
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B. ITERATIVE SOLUTION OF LARGE LINEAR SYSTEMS 

The approximation of a variational principle by direct methods such as the finite element 
method (or the approximation of a boundary value problem by finite differences) gives rise to a 
system of simultaneous algebraic equations. For quadratic functionals (or linear boundary value 
problems), the system will be linear. In this chapter, we consider the problem of solving, by 
iterative means, systems of linear equations of the form 

N 

c ‘ij u j — fi) 1 fC i < N , (1) 

3 —1 

where the coefficients a tJ and the vaffies /, are known. The system may also be written as the 
matrix equation 

Au = f, (2) 

where A £ fft NN — [o tJ ] is a nonsingular matrix and where f £ = [/,] is a column vector. We 

wish to solve the system for the column vector u £ fR N — [u t ] = A -5 f. In applying the finite 
element method to our visual surface reconstruction problem, we will obtain large sparse matrices 

A. In this appendix we will be concerned with numerical methods which are capable of solving 
such systems where N is in the range of, say, 10 3 — 10 6 , or larger. 

An iterative method for solving equations (2) computes a sequence of approximations 
u* 1 ), u^ 2 ),..., to the exact solution u. A new, and hopefully better, approximation is computed 
at each iteration, but, in general, the exact solution cannot be obtained in a finite number of 
iterations. If, regardless of the initial approximation u(°V 4 the approximation error (i.e., the 
difference between the exact solution and the approximations, measured in some appropriate norm) 
tends to zero as the number of iterations increases, the iterative method is said to be convergent, 
and the rate at which the approximation error tends to zero is called its rate of convergence. In 
order that an iterative method be of practical use, it is important that it be convergent, and that 
it exhibit a sufficiently large rate of convergence to an approximation of prespecified accuracy. 
In this appendix, we review a number of the most common iterative methods, and examine their 
convergence properties and their rates of convergence. References for this material are [Forsythe 
and Wasow, 1960], [Smith, 1977], [Varga, 1963] and [Young, 1971; Young and Gregory, 1972, 
1973]. 

B. l. Basic Relaxation Methods 

Let us assume that A has nonzero diagonal elements. It will be convenient to express A as 
the sum 

A = D - L - U, 

14 The trivial initial approximation = 0 is usually chosen. 
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where D £ dt NN is diagonal, L £ 9? yV ‘ v is strictly lower triangular, and U £ ffl NN is strictly upper 
triangular. Clearly the equations in (1) can be rewritten in the following form: 

anu t = — aijUj -f /,, 1 < t < N. 

1 <j <N 

Next, we will define three basic iterative methods for solving (2), popularly known as relaxation 
methods in the context of the numerical solution of partial differential equations. 

Jacobi Relaxation 

The Jacobi method (or the method of simultaneous displacements ) is defined by 

auu\ k+1) = — a ij u \ k) + fi, 1 < » < N, 

i<j<N 

which can be written in matrix form as 

Du (fc+I) = (L + U)u w -f f, 

thus giving us the iterative scheme 

u (fc+1 > — D —1 (L + V)u^ + D —1 f. (3) 

The matrix 

' Gj = D-‘(L + D) 

is called the Jacobi iteration matrix associated with matrix A. 


Clearly, the Jacobi method is a parallel method because the elements of the new approximation 
u ( fc 4-i) ma y b e computed simultaneously and in parallel by a network of processors whose inputs 
are elements of the old approximation As such, it requires the storage of both the old and 
the new approximations. 

Gauss-Seidel Relaxation 


Convergence of the Jacobi method is usually very slow. In a closely related method, the 
so called Gauss-Seidel method (or the method of immediate displacements), the elements of the 
new approximation are used in subsequent computations immediately after they become available. 
This increases the rate of convergence somewhat, but typically less than an order of magnitude. 
The equations are written as 


an ui 


(fc+i) 


-E 

3 — 1 


ai J u J 


(A:-f 1) 


N 

E 


,(*0 

>£■ 


Ot.-u:- ' -f f t , 


1 < * < N, 


which, in matrix form, becomes 


Du ( * +1) = Lu (fc+1) + Uu (fc) + f, 


(4) 
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from which we obtain the iterative scheme defined by 

u (fc+!) — (D — L) _1 Uu (fc) + (D - L)- 1 ^ (5) 

The Gauss-Stidel iteration matrix associated with matrix A is therefore given by 

g G5 = (d-l)~ 1 u. 

In the Gauss-Seidel method, we no longer need to store simultaneously the old and new 
approximations. As they are computed, the new elements can simply displace the old ones. 
Moreover, since the new values are exploited immediately in subsequent computations we can 
intuitively expect a higher rate of corvergence compared with the Jacobi method. On the other 
hand, it can easily be seen that the Gauss-Seidel method is inherently a sequential method which 
renders it unsuitable for implementation as a parallel network. 

The Successive Overrelaxation {SOR) Method 

The convergence of the Gauss-Seidel method may be accelerated by a simple modification. 
Let us define the dynamic residual vector 15 at the k ^ iteration of the Gauss-Seidel relaxation 
method as 

r (fc) = u (*+i) _ u (*) 

= D- 1 (Lu (fc+1 ) + Uu<*> -f f) — u( fe > 

(see (4)). Then, it is evident that the Gauss-Seidel scheme can be written in the form 

„(*+*) = u (^) _j_ r (fc). 

If the elements of all successive residual vectors are one-signed (as they usually are when 
approximating elliptic problems), then it is reasonable to anticipate an acceleration in the 
convergence of the Gauss-Seidel scheme if the residual vector is scaled by a fixed real number 
u; > 1 before it is added to in each Gauss-Seidel iteration, o o is called the relaxation parameter. 
This idea leads to one of the most successful iterative methods for solving systems of linear 
equations currently available, the successive overrelaxation method. The method may be defined 
by 

— u ( fc ) _|_ (jjffo) 

= + w[D“ 1 (Lu<* +1 > + Uu (fc) + f) — u<*)], 

which we can manipulate into the form 

u(fc-H) = (I _ coD~ l L) —1 [(1 - w)I + wD^Uju*** + (I - (6) 

The successive overrelaxation iteration matrix is therefore given by 

G w = (I — coD- 1 L)- 1 [(1 - w)I + coD~ l Uj. 

15 The residual vector is also called the correction or displacement vector. Note that the residual of an 
equation is the amount by which the equation fails to be satisfied. 
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Three cases arise: 

(i) if u) > 1 the scheme is called overrelaxation, 16 

(ii) if uj — 1 we obtain the Gauss-Seidel method, and 

(iii) if 0 < cj < 1 the scheme is termed underrelaxation. 

Conditions for Convergence 

The Jacobi, Gauss-Seidel, and successive overrelaxation methods are particular instances of 
the general stationary iterative method 

u (fc+1) = Gu (fc) + c, (7) 

where the iteration matrix G is taken to be Gj, Gc;s, and G w respectively, and c is a known 
column vector. 17 By subtracting u = Gu + c from (7), we can obtain an expression for the errors 
e( fc ) = — u of successive approximations, 

e^ 1 ) = Ge< fc ) 

= G fc+1 e(°). (8) 

The sequence of approximations converges to the solution u if limfc_ 00 = 0, which will 

clearly be the case if and only if lim^oo G fc = 0 , since (and hence e* 0) ) is arbitrary. Let 

G have eigenvalues X], X 2 ,..., \n, and assume that the corresponding eigenvectors vj,V 2 , • •. ,vjv 
are linearly independent. Then the initial error vector can be expressed uniquely as the linear 
combination 


N 

e (0) = *Vi. 
t=i 

But, by (8), 

e (k) _ Qk e (0) 

N 

= E <=.G k v. 

t = l 

N 

= Ci xfcv ‘- 

t=i 

It follows that in the limit, will be zero for an arbitrary initial approximation if and only if 

|Xj| < 1, for 1 < i < N. Thus, we have the following theorem. 

16 It should be noted that there exist classes of matrices (which arise from many first and second 

order partial differential equations) for which the optimal value of w, yielding the largest rate of 

convergence, may be determined analytically (see e.g. [Young, 1972], [Young and Gregory, 1973]). Often, 
the convergence may be adequately accelerated by not necessarily optimal values of ui chosen empirically. 
Of course, it is possible to vary u from iteration to iteration or from one equation to the next. A 
number of these modified methods have been studied in the literature (see e.g. [Varga, 1962] or [Young, 
1971; Young and Gregory, 1972, 1973]). 

17 The method is obtained by writing the original system Au — f as u = Gu -j- c, and is referred to as 
being stationary because G is fixed for all iterations. 
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Theorem 1 — necessary and sufficient condition for convergence of the stationary iterative method 
The stationary iterative method (7) is convergent if and only if 

p(C) = max |Xi(G)| < 1, 

t —i 

where /?(G) is called the spectral radius of G. 18 

Theorem 1 is mainly of theoretical value because, in practice, it is difficult to determine 
the eigenvalues of G. Fortunately, we have a useful corollary giving sufficient conditions for 
convergence. The corollary results from the observation that for some matrix norm ||G|| which is 
consistent 19 with a vector norm ||vt||, 

||e (t l||<||G t ||||c(»)||<||G|| t ||e(»)||. (9) 

Hence we obtain the following corollary to Theorem 1. 

Corollary 1 — sufficient condition for convergence of the stationary iterative method 

//||G|| < 1 , then the stationary iterative method (7) is convergent. 20 

Suppose that the stationary iterative method is convergent. It can be shown (see [Varga, 1962] or 
[Young, 1971]) that 

lim ||G fc |(;^ 1/fc — p(G). 

k —*co 

Hence, from (9) we have that, for large k, 

= ^(G) fc ||e«»||z. a . 

Thus, in a certain sense, p(G) is a measure of the rate of convergence of the iterative method and 
therefore, like convergence itself, depends on the eigenvalues of G. 

We illustrate an application of Corollary 1 in obtaining a simple but important sufficient 
condition for the convergence of Jacobi relaxation. The Jacobi iteration matrix G j consists of 
the elements 


9ij — > * 7 ^ 

9a = 0 . 

Therefore the Lqq norm of G j is given by 

y- 1 °«j 

1 <j<N l° il 

18 It can be shown (see e.g. [Varga, 1962]) that the theorem holds without the independence assumption 
on the eigenvectors. 

19 If a matrix norm and a vector norm satisfy the relation ||Au|| < ||A.|| ||u|| for any A and u, then the 
two norms are said to be consistent [Dahlquist and Bjorck, 1974, pg. 175]. 

20 The condition is not a necessary one because we can have the case that ||G|| > 1 when p(G) < 1. 


IG, 


N 

max 

1=1 
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Clearly, if |a tt | > Y1i<j<n |a»j|, |jC./||oo < 1> and Jacobi relaxation will converge by corollary 

jj&i 

1. A much more general result may be obtained. We begin by defining two important properties 
that A may possess. 

Definition 1 — (weakly) diagonally dominant matrix 

A matrix A of order N is said to be diagonally dominant if 

M > £ M, i < > < n. 

1<J<N 

The matrix is said to be weakly diagonally dominant if the > relation in the above inequality can 
be replaced by > in some, but not all, of the equations. 

Definition 2 — irreducible matrix 

A matrix A of order N is irreducible if and only if N — 1, or if N >1 and for any i 
and j such that 1 < x,j < N and i ^ j either a t j — 0, or there exist t'i, * 2 > • •. > ** such that 

‘ •°«fc ,3 7 ^ 0. 21 

It can be shown that if A is irreducible and has weak diagonal dominance, then it is nonsingular, 
and if in addition it is symmetric and has non-negative diagonal elements, then it is positive 
definite. The general theorem is given next (For a proof see [Varga, 1962, pg. 73]). 

Theorem 2 — sufficient conditions fur convergence of the Jacobi and Gauss-Seidel relaxation 

Let A be either a diagonally dominant or an irreducible and weakly diagonally dominant 
matrix. Then, both the associated Jacobi and Gauss-Seidel relaxation methods of (4) and (5) are 
convergent. 

Next, we turn our attention to the successive overrelaxation method. Since the inverse of a 
triangular matrix is also a triangular matrix, and its determinant is equal to the product of its 
diagonal elements, we have 

det(G w ) = det[(I - wD-’L)- 1 ] det[(l - u/)I + wD^V) = (1 — w ) N . 

Since det(G u ,) = it follows that 

N 

max |Xtj > |1 — w|, 

t —1 

which, by theorem 1, leads to the following. 

Theorem 3 — convergence of the SOR method 

p{G ul ) > |w — lj 

21 The term irreducible matrix was introduced by Frobenius for matrices which (informally speaking) 
correspond to systems of equations whose solutiens cannot be reduced to the scjtion of two systems of 
lower order. One generally obtains irreducible matrices when discretizing boundary value problems over 
connected domains. 
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therefore the successive overrelaxation method (6) can converge only if 0 < u> < 2. 

A set of necessary and sufficient conditions for convergence for the successive over relaxation 
method are stated in the following theorem. 

Theorem 4 — convergence of SOR method for symmetric, positive definite A 

If A is real, symmetric matrix with positive diagonal elements, then the successive overrelaxation 
method (5) is convergent if and only if 0 < w < 2 and A is positive definite. 

The same conditions for convergence hold for the Gauss-Seidel method :.ince, by definition, 
it is a special case of successive overrelaxation. 

Corollary 2 — convergence of Gauss-Seidel relaxation for symmetric, positive definite A 

If A is a symmetric matrix with positive diagonal elements, then the Gai ss-Seidel method is 
convergent if and only if A is positive definite. 

It is important to note that the same statement cannot be made about the Jacobi method. 


B.2. Basic Gradient Methods 


In this section we investigate a class of iterative methods which are naturally associated 
with optimization theory. These are the so called gradient methods which, in their full generality, 
are iterative techniques for minimizing nonlinear functionals. They may also be thought of as 
methods for solving systems of linear equations for the special case where the functional to be 
minimized is a quadratic form. 

Assume that A £ $Z NN is symmetric. Now suppose that we attempt to solve the following 
unconstrained minimization problem involving the quadratic form £(v) 

<f(u) = v mf N £(v) = i(v, Av) - (f, v), 


where (•, •): dt N X t—► 9? denotes the familiar Euclidean inner product. From the well-known 

theorem of optimization theory (see, e.g. [Luenberger, 1973]), the gradient vector of <f(u) 

V £ (u) = Au — f, 

vanishes at a minimum u and, moreover, the minimum exists and is unique if the Hessian matrix 


is positive definite. 


d 2 <f(u)~ 

duiduj 


= A. 


Thus, for symmetric, positive definite A, solving the minimization problem is equivalent 
to solving the system of linear equations Au = f and, consequently, relaxation methods can be 
thought of as being methods for descending to the minimum u of a quadratic functional. 
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Gradient Descent 

Consider the iterative method u^-H) = u^) a( fc )d( fc ) ( where at each iteration, we take 
a step in the direction of the vector d( fc ). To minimize <f(u) quickly, we should move in the 
direction of steepest descent, which is given by the negative gradient; that is, d< fc ) = —V<f(u( fc >) = 
f— Au* fc ) — where is the familiar residual vector. Thus we obtain the nonstationary 
iterative method defined by 


(10) 

It seems reasonable to choose the step size at each iteration so as to minimize £ 

The appropriate value can easily be shown to be 


a 


(*) = 


(r(fc), r (*0) 
(rl fc ), Ar( fc )) 


( 11 ) 


On the other hand, if we fix a ^ = a for all iterations, then (10) becomes 

u (fc+i) — (i _ q A)u^^ + at, (12) 

which can be identified as a stationary iterative method with iteration matrix G = I — aA. 


The Conjugate Gradient Method 

Hestenes and Stiefel [1952] introduced the conjugate gradient method, a modification of the 
method of gradient descent. The method is based on determining vectors d^, d^,..., d^ n— 
which are pairwise conjugate in the sense that (d^Ad^) = 0 for i j. The ease in applying 
the method derives from the fact that these vectors may also be determined iteratively. With the 
residual vector at the kfi 1 iteration given by = f — Au( fc ) and with d*°) = r*°), the algorithm 
for determining the d( fc ) and is as follows. 



(fc+i) 

d( fc ) 


u (fc) + d( fc ) 

' (d< fc ),Ad( fc )) ’ 

(r (M,Ad (fc -N) Wfc-i) 

(d< fc - 1 ),Ad<*- 1 >) » 


= r( fc ) 


0 < k < N — 1; 
1 < k < N - 1. 


Conjugacy of the vectors can be verified along with the fact that (r^r^) = 0 for i y^ j. This 
implies that r- fc ) = 0 for some k < N . Therefore, in the absence of roundoff errors, the method 
converges in at most N iterations. Of course, this property is of no real value to us because we 
must deal with cases where N is very large. Nevertheless, for N large, typically « u for 
k -C N and the algorithm may be used in the iterative spirit. The conjugate gradient method is 
certainly the most expensive of the algorithms discussed both in terms of space (since the vectors 
u ( fc ) ? d( fc ), r^ k \ and Ad^ must be stored) and in terms of the number of operations to complete 
one iteration. While it is true that, for model problems, the number of iterations required to 
reduce the error by a specified amount is usually considerably less than for the other methods, the 
conjugate gradient method seems co exceed at least the successive overrelaxation method (with 
optimal u>) in total number of operations required [see e.g. Young and Gregory, 1973, pg. 1071]. 
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Convergence and Comparisons to Relaxation 

It is easy to show [Luenberger, 1973] that gradient descent (10) with (11) is convergent for 
a positive definite matrix A to the solution u — A —1 f where the quadratic form £ is minimized. 
On the other hand, we must be a little more careful with the fixed-a descent algorithm (12). The 
eigenvalues X t of its iteration matrix fl are related to the eigenvalues X' of A, all of which are 
positive (since A is positive definite), by X; = 1 — aX'. Therefore, according to theorem 1, we 
obtain the following. 

Theorem 5 — necessary and sufficient condition Tor convergence 

For a positive definite matrix A the fixed-a method of equation (IS) is convergent if and 
only if 0 < oc < 

Of course, convergence is quickest for that a which minimizes p( G), which often cannot be 
determined in practice. 

By comparing (12) with the Jacobi method (3), we can convince ourselves that the two 
methods become identical when A is positive definite, has identical elements on its main diagonal 
(i.e. D - ol), and a — 1/a. Moreover, Forsythe and Wasow [I960, pg. 239] (see also [Milne, 
1970]) show that the Gauss-Seidel and successive overrelaxation methods are also subject to 
interpretations as descent methods. In these cases, however, we have not one, but a set of direction 
vectors which turn out to be x,, the coordinate unit vectors of 9f v . During each iteration, we 
take a sequence of steps of different sizes in each of these directions. The step sizes are such that 
<f(u( fc ) -f- aj fc) x t ) is minimized, and are given by a|^ = r\ k) /a lt for the Gauss-Seidel method and 
c*^ = ur^/aa for the successive overrelaxation method, where r \ k * is the element of the 
residual vector at iteration k. 

The computation of an optimal at each iteration according to (11) requires that r^ fc ^ 
be stored and that Ar^ fc ) be evaluated. This doubles the amount of work required per iteration 
in comparison with a fixed-a algorithm or Jacobi relaxation. This raises the question: which is 
better in the long run; N iterations of gradient descent, or 2 N iterations of the fixed-a or Jacobi 
relaxation? To quantitatively decide the issue, a convergence analysis ought to be attempted. This 
is problem-dependent and is generally difficult to do, so we will simply note that Forsythe and 
Wasow [1960, pg. 225] do not recommend the optimization of a and, moreover, refer to a result by 
Stiefel [1955] indicating that it is, at best, a short-sighted strategy. Interestingly enough, Grimson 
[1981a] used the optimal-a algorithm in his implementation of the surface interpolation algorithm 
(with a minor modification to make certain that the fixed constraints are never modified, thus, 
in effect, treating them as essential boundary conditions). Considering the statements of Forsythe 
and Wasow, it is not surprising that extremely slow convergence was observed in spite of the 
extra work expended at each iteration to compute the optimal a. 
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C. LOCAL FOURIER ANALYSIS OF RELAXATION 


In this appendix, we present the details of a local Fourier analysis of the Gauss-Seidel 
relaxation and obtain the smoothing factor for this method. The analysis involves studying 
separately the convergence of the high-frequency Fourier components. Since these components 
have short coupling ranges, we can perform the analysis in the interior of f2, ignoring the effects 
of the boundary and the constraints. 

According to equation (4.9), the minimizing displacement u t j at an interior node (i,j) is 
related to the other nodes by 


20ui,j — 8(u t _ 1)J -(- t j -f- 4- u ;,j+i) 

4- 2(u£_ 1)J _i 4 Ut + lJ — 1 4 u t _ 1)j+1 4 Ui_(_ 1)J + 1 ) 
4 l(u t _2,j 4 4 u t,y —2 4 u; j+ 2 ) = 0 


( 1 ) 


(for convenience, we have suppressed the superscripts h). According to the discussion in Appendix 
B, at iteration k of the Gauss-Seidel relaxation method, v[ j is replaced by a new value v-^ 1 ^ 
such that 


on v (fc+1) — 4- v ( fc ) 

zuv t)J — 8 ^v t _! J 4 V t+1 tj 4 v t>J _ 1 4 J 

-4-94_J fc+1 ) I J k ) _±_J k ) \ 

4 _ l/v^T 1 ) , v (*) 4 _v( fc+1 )-L \ — r) 

+ \ y i-2,j + V i + 2,j + y i,j —2 + V t,;+2 J — 0. 

The errors of the approximation at iteration k and iteration k 4 1 are given by 


( 2 ) 


4) 


,j — v 


(fc) 


and 


(fc-f-i) 


't,j ~ 1 >J ’t,j - ~t,j 

respectively. Subtracting (2) form (1), we obtain 




(fc+i) 


20e!^-) - 8(eS4!J + ^ +1 ) 

+ + e!^4 1 + ^ 1J+1 + ^ 1J+1 ) 

+ l(eK4e«, J+ eS7J> + e!4 2 ) = 0, 


(3) 


Suppose that the error consists of only a single spatial Fourier component Q = [wj , co 2 ]. Then the 
errors at node (s, j) before and after the k^ iteration are given by 


an( j = v 4^+ 1 ) e d w i*+ w aj) 


(4) 


respectively, where t = v — 1. 


Substituting (4) into (3), dividing through by an( j collecting terms pertaining to 

the same iteration, we obtain 
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A^^—S[e LWl -f- e tW2 ) -j- 2(e^‘ Jl+W2 ) -f- e^ - '( e 2tWl -f- e 2tWa 

+ a!£ +1) ^2Q — 8(e~ tWl -f e “ lu ' a ) + 2(e l(wi—W2) -|-e l( ~ Wl_Wa) )-f (e -2 ^ 1 -f e“ 2tW2 )^ = 0. 

The amplification of the Q component is then given by 

,*(*+ 1 ) 

A (k) 

-|_ e tu;2 ) __ 2(e A ( Wl + Wa ^ -f- — w i-H w 2 )) ^e 2tWl -j- e 2tula ) 

20 — 8(e — luJl -j- e— tW2 ) -)- 2(e l ( Wl — W2 ) -f- e l (— Wl —)) -(- (e — 2luJl -f- e~ 2tW2 ) 

Let |cD| = max(|wi |, |cu» 2 1)* The Gauss-Seidel smoothing factor is defined as the smallest 
amplification attained for a high-frequency component of the error; that is, a component which 
is visible on the fine grid, but is aliased on the coarse grid: 

Mgs = „ max m G5 (oJ). 

f <|u>|<7r 

Evaluating this expression numerically, we obtain Ji GS « 0.8 (for u>i =1.6 and u> 2 == 0.3). 



Mgs(w) = 
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